1. Giant leaps in performance and efficiency for AI services, from the data center to the network’s edge,2019
2. R. Xu, F. Han, Q. Ta, Deep learning at scale on NVIDIA V100 accelerators, in: Proc. of 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS18), 2017.
3. D. Crankshaw, X. Wang, G. Zhou, M.J. Franklin, J.E. Gonzalez, I. Stoica, Clipper: A low-latency online prediction serving system, in: Proc. of 14th USENIX Symposium on Networked Systems Design and Implementation, 2017, pp. 613–627.
4. C. Olston, N. Fiedel, K. Gorovoy, J. Harmsen, L. Lao, F. Li, V. Rajashekhar, S. Ramesh, J. Soyke, TensorFlow-Serving: Flexible, high-performance ML serving, in: Proc. of Workshop on ML Systems at NIPS 2017, 2017.
5. Nvidia TensorRT inference server,2019