1. NVIDIA triton inference server;Inc.,2023
2. Tensorflow-serving: Flexible, high-performance ml serving;Olston,2017
3. W. Cui, H. Zhao, Q. Chen, H. Wei, Z. Li, D. Zeng, C. Li, M. Guo, DVABatch: Diversity-aware Multi-Entry Multi-Exit Batching for Efficient Processing of DNN Services on GPUs, in: 2022 USENIX Annual Technical Conference, 2022, pp. 183–198.
4. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
5. Bert: Pre-training of deep bidirectional transformers for language understanding;Devlin,2018