1. N. Fatima, “Enhancing performance of a deep neural network: A comparative analysis of optimization algorithms,” Advances in Distributed Computing and Artificial Intelligence Journal, vol. 9, no. 2, pp. 79–90, 2020.
2. H. Robinds and S. Monro, “A stochastic approximation method,” Annals of Mathematical Statistics, vol. 22, pp. 400–407, 1951.
3. N. Qian, “On the momentum term in gradient descent learning algorithms,” Neural networks, vol. 12, no. 1, pp. 145–151, 1999.
4. J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” Journal of Machine Learning Research, vol. 12, pp. 2121–2159, 2011.
5. D. P. Kingma and j. L. Ba, “Adam: A Method for Stochastic Optimization,” Proc. of the 3rd International Conference on Learning Representations, pp. 1–15, San Diego, 2015.