Queueing analysis of GPU-based inference servers with dynamic batching: A closed-form characterization-Reference-Cited by-同舟云学术

Queueing analysis of GPU-based inference servers with dynamic batching: A closed-form characterization

Published:2021-05 Issue: Volume:147 Page:102183
ISSN:0166-5316
Container-title:Performance Evaluation
language:en
Short-container-title:Performance Evaluation

Author:

Inoue Yoshiaki^ORCID

Funder

Japan Society for the Promotion of Science

Publisher

Elsevier BV

Subject

Computer Networks and Communications,Hardware and Architecture,Modeling and Simulation,Software

Reference25 articles.

1. Giant leaps in performance and efficiency for AI services, from the data center to the network’s edge,2019

2. R. Xu, F. Han, Q. Ta, Deep learning at scale on NVIDIA V100 accelerators, in: Proc. of 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS18), 2017.

3. D. Crankshaw, X. Wang, G. Zhou, M.J. Franklin, J.E. Gonzalez, I. Stoica, Clipper: A low-latency online prediction serving system, in: Proc. of 14th USENIX Symposium on Networked Systems Design and Implementation, 2017, pp. 613–627.

4. C. Olston, N. Fiedel, K. Gorovoy, J. Harmsen, L. Lao, F. Li, V. Rajashekhar, S. Ramesh, J. Soyke, TensorFlow-Serving: Flexible, high-performance ML serving, in: Proc. of Workshop on ML Systems at NIPS 2017, 2017.

5. Nvidia TensorRT inference server,2019

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Large models for intelligent transportation systems and autonomous vehicles: A survey;Advanced Engineering Informatics;2024-10

2. InSS: An Intelligent Scheduling Orchestrator for Multi-GPU Inference With Spatio-Temporal Sharing;IEEE Transactions on Parallel and Distributed Systems;2024-10

3. Optimizing High-Throughput Inference on Graph Neural Networks at Shared Computing Facilities with the NVIDIA Triton Inference Server;Computing and Software for Big Science;2024-07-18

4. BatOpt: Optimizing GPU-Based Deep Learning Inference Using Dynamic Batch Processing;IEEE Transactions on Cloud Computing;2024-01

5. SMDP-Based Dynamic Batching for Efficient Inference on GPU-Based Platforms;ICC 2023 - IEEE International Conference on Communications;2023-05-28