Interference-Aware Scheduling for Inference Serving

Author:

Mendoza Daniel1,Romero Francisco1,Li Qian1,Yadwadkar Neeraja J.1,Kozyrakis Christos1

Affiliation:

1. Stanford University

Publisher

ACM

Reference33 articles.

1. 2018. NVIDIA TensorRT: Programmable Inference Accelerator. https://developer.nvidia.com/tensorrt. 2018. NVIDIA TensorRT: Programmable Inference Accelerator. https://developer.nvidia.com/tensorrt.

2. AWS [n.d.]. AWS Neuron. https://github.com/aws/aws-neuron-sdk. AWS [n.d.]. AWS Neuron. https://github.com/aws/aws-neuron-sdk.

3. AWS 2018. AWS Inferentia. https://aws.amazon.com/machine-learning/inferentia/. AWS 2018. AWS Inferentia. https://aws.amazon.com/machine-learning/inferentia/.

4. AWS 2019. Deliver high performance ML inference with AWS Inferentia. https://d1.awsstatic.com/events/reinvent/2019/REPEAT_1_Deliver_high_performance_ML_inference_with_AWS_Inferentia_CMP324-R1.pdf. AWS 2019. Deliver high performance ML inference with AWS Inferentia. https://d1.awsstatic.com/events/reinvent/2019/REPEAT_1_Deliver_high_performance_ML_inference_with_AWS_Inferentia_CMP324-R1.pdf.

Cited by 16 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. ESEN: Efficient GPU sharing of Ensemble Neural Networks;Neurocomputing;2024-09

2. Model Selection for Latency-Critical Inference Serving;Proceedings of the Nineteenth European Conference on Computer Systems;2024-04-22

3. Deep Learning Workload Scheduling in GPU Datacenters: A Survey;ACM Computing Surveys;2024-01-22

4. Cloud-Native Computing: A Survey From the Perspective of Services;Proceedings of the IEEE;2024-01

5. Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU;Proceedings of the 21st ACM Conference on Embedded Networked Sensor Systems;2023-11-12

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3