1. Optimization methods for large-scale machine learning;Bottou;Siam Review,2018
2. Reducing the dimensionality of data with neural networks;Hinton;Science,2006
3. Y. Huang, Y. Cheng, A. Bapna, O. Firat, M. X. Chen, D. Chen, H. Lee, J. Ngiam, Q. V. Le, Y. Wu, Z. Chen, GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism, arXiv:1811.06965 [cs]ArXiv: 1811.06965. http://arxiv.org/abs/1811.06965.
4. S. Kornblith, J. Shlens, Q. V. Le, Do Better ImageNet Models Transfer Better?, arXiv:1805.08974 [cs, stat]ArXiv: 1805.08974. URLhttp://arxiv.org/abs/1805.08974.