1. Learning characteristics of stochastic-gradient-descent algorithms: A general study, analysis, and critique;Signal Process.,1984
2. Oymak, S., and Soltanolkotabi, M. (2019, January 9–15). Overparameterized nonlinear learning: Gradient descent takes the shortest path?. Proceedings of the 36th International Conference on Machine Learning (ICML), PMLR, Long Beach, CA, USA.
3. Ali, A., Dobriban, E., and Tibshirani, R. (2020, January 13–18). The implicit regularization of stochastic gradient flow for least squares. Proceedings of the 37th International Conference on Machine Learning (ICML), PMLR, Virtual.
4. Douglas, S., and Meng, T.Y. (1991, January 8–12). Linearized least-squares training of multilayer feedforward neural networks. Proceedings of the IJCNN-91-Seattle International Joint Conference on Neural Networks, Seattle, WA, USA.
5. Training multilayer perceptrons with the extende kalman algorithm;Adv. Neural Inf. Process. Syst.,1988