1. Demystifying parallel and distributed deep learning: An in-depth concurrency analysis;Ben-Nun;ACM Comput. Surv.,2019
2. Accurate, large minibatch sgd: Training imagenet in 1 hour;Goyal,2017
3. Optimizing network performance for distributed dnn training on gpu clusters: Imagenet/alexnet training in 1.5 minutes;Sun,2019
4. Y. You, J. Li, S. Reddi, J. Hseu, S. Kumar, S. Bhojanapalli, X. Song, J. Demmel, K. Keutzer, C.-J. Hsieh, Large batch optimization for deep learning: Training bert in 76 minutes, in: International Conference on Learning Representations, 2020.
5. Performance modeling and evaluation of distributed deep learning frameworks on gpus;Shi,2018