1. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: proceedings 3rd International Conference on Learning Representations, ICLR
2. Dean J, Corrado G, Monga R, Chen K, Devin M, Mao MZ, Ranzato M, Senior AW, Tucker PA, Yang K, et al (2012) Large scale distributed deep networks. In: Advances in Neural Information Processing Systems 25, 1232–1240
3. Ho Q, Cipar J, Cui H, Lee S, Kim JK, Gibbons PB, Gibson GA, Ganger G, Xing EP (2013) More effective distributed ml via a stale synchronous parallel parameter server. In: Advances in Neural Information Processing Systems 26, 1223–1231
4. Li M, Andersen DG, Park JW, Smola AJ, Ahmed A, Josifovski V, Long J, Shekita EJ, Su B-Y (2014) Scaling distributed machine learning with the parameter server. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI’14, 583–598
5. Huang Y, Cheng Y, Bapna A, Firat O, Chen D, Chen M, Lee H, Ngiam J, Le QV, Wu Y, Chen z (2019) Gpipe: Efficient training of giant neural networks using pipeline parallelism. In: Advances in Neural Information Processing Systems 32:103–112