1. Deep Residual Learning for Image Recognition
2. BERT: Pretraining of deep bidirectional transformers for language understanding;Devlin
3. MLaaS in the wild: Workload analysis and scheduling in large-scale heterogeneous gpu clusters;Weng;NSDI,2022
4. Accurate, large minibatch SGD: Training ImageNet in 1 hour;Goyal,2017
5. Highly scalable deep learning training system with mixed-precision: Training ImageNet in four minutes;Jia