1. 2016 An overview of gradient descent optimization algorithms;Ruder;(arXiv:1609.04747)
2. 2020 An Overview of Gradient Descent Algorithm Optimization in Machine Learning: Application in the Ophthalmology Field;Aatila
3. Stochastic Gradient Descent as Approximate Bayesian Inference;Mandt;(arXiv:1704.04289),2017
4. Nesterov’s Accelerated Gradient and Momentum as approximations to Regularised Update Descent International Joint Conference on Neural Networks (IJCNN);Botev
5. On the momentum term in gradient descent learning algorithms