1. Deep residual learning for image recognition;He,2016
2. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups;Hinton;IEEE Signal Process. Mag.,2012
3. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, 2017.
4. Open problem: The landscape of the loss surfaces of multilayer networks;Choromanska,2015
5. Entropy-SGD: Biasing gradient descent into wide valleys;Chaudhari;J. Stat. Mech: Theory Exp.,2019