1. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization;Dauphin;Adv. Neural Inf. Process. Syst.,2014
2. Qualitatively characterizing neural network optimization problems;Goodfellow,2014
3. Handwritten digit recognition with a back-propagation network;LeCun;Adv. Neural Inf. Process. Syst.,1990
4. A. Brutzkus, A. Globerson, E. Malach, S. Shalev-Shwartz, SGD learns over-parameterized networks that provably generalize on linearly separable data, in: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30– May 3, 2018, Conf. Track Proc., 2018.
5. Learning overparameterized neural networks via stochastic gradient descent on structured data;Li;Adv. Neural Inf. Process. Syst.,2018