1. Optimization for deep learning: an overview;Sun;J. Oper. Res. Soc. China,2020
2. A comparison of optimization algorithms for deep learning;Soydaner;Int. J. Pattern Recognit. Artif. Intell.,2020
3. A stochastic approximation method;Robbins;Ann. Math. Stat.,1951
4. On the momentum term in gradient descent learning algorithms;Qian;Neural Netw.,1999
5. A method for unconstrained convex minimization problem with the rate of convergence O(1/k̂2);Nesterov,1983