1. Analysis and simulation of a fair queueing algorithm
2. Accelerating large scale deep learning inference through DeepCPU at Microsoft;zhang;Proceedings of the USENIX Conference on Operational Machine Learning (OpML),2019
3. Accelerating reduction and scan using tensor core units
4. MArk: Exploiting cloud services for cost-effective, SLO-aware machine learning inference serving;zhang,0
5. Reduce inference costs on Amazon EC2 for PyTorch models with Amazon elastic inference;fan;Aws Machine Learning,2020