Interference-Aware Scheduling for Inference Serving-Reference-Cited by-同舟云学术

Interference-Aware Scheduling for Inference Serving

Published:2021-04-26 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the 1st Workshop on Machine Learning and Systems
language:
Short-container-title:

Author:

Mendoza Daniel¹,Romero Francisco¹,Li Qian¹,Yadwadkar Neeraja J.¹,Kozyrakis Christos¹

Affiliation:

1. Stanford University

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3437984.3458837

Reference33 articles.

1. 2018. NVIDIA TensorRT: Programmable Inference Accelerator. https://developer.nvidia.com/tensorrt. 2018. NVIDIA TensorRT: Programmable Inference Accelerator. https://developer.nvidia.com/tensorrt.

2. AWS [n.d.]. AWS Neuron. https://github.com/aws/aws-neuron-sdk. AWS [n.d.]. AWS Neuron. https://github.com/aws/aws-neuron-sdk.

3. AWS 2018. AWS Inferentia. https://aws.amazon.com/machine-learning/inferentia/. AWS 2018. AWS Inferentia. https://aws.amazon.com/machine-learning/inferentia/.

4. AWS 2019. Deliver high performance ML inference with AWS Inferentia. https://d1.awsstatic.com/events/reinvent/2019/REPEAT_1_Deliver_high_performance_ML_inference_with_AWS_Inferentia_CMP324-R1.pdf. AWS 2019. Deliver high performance ML inference with AWS Inferentia. https://d1.awsstatic.com/events/reinvent/2019/REPEAT_1_Deliver_high_performance_ML_inference_with_AWS_Inferentia_CMP324-R1.pdf.

Cited by 16 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. ESEN: Efficient GPU sharing of Ensemble Neural Networks;Neurocomputing;2024-09

2. Model Selection for Latency-Critical Inference Serving;Proceedings of the Nineteenth European Conference on Computer Systems;2024-04-22

3. Deep Learning Workload Scheduling in GPU Datacenters: A Survey;ACM Computing Surveys;2024-01-22

4. Cloud-Native Computing: A Survey From the Perspective of Services;Proceedings of the IEEE;2024-01

5. Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU;Proceedings of the 21st ACM Conference on Embedded Networked Sensor Systems;2023-11-12