1. RuderS.: ‘An overview of gradient descent optimization algorithms’. arXiv preprint arXiv:1609.04747 2017
2. On the momentum term in gradient descent learning algorithms
3. A method for unconstrained convex minimization problem with the rate of convergence o(1/k
2);Nesterov Y.;Doklady ANSSSR (Translated as Soviet. Math. Docl.),1983
4. Adaptive subgradient methods for online learning and stochastic optimization;Duchi J.;J. Mach. Learn. Res.,2011
5. ZeilerM.D.: ‘ADADELTA: an adaptive learning rate method’. arXiv preprint arXiv:1212.5701 2012