Affiliation:
1. Faculty of Mathematics and Computer Science, University of Münster 48149 Münster, Germany
2. Division of Mathematical Sciences, NTU Singapore Singapore 637371, Singapore
3. Department of Mathematics, ETH Zurich 8092 Zurich, Switzerland
Abstract
Abstract
Stochastic gradient descent (SGD) optimization algorithms are key ingredients in a series of machine learning applications. In this article we perform a rigorous strong error analysis for SGD optimization algorithms. In particular, we prove for every arbitrarily small $\varepsilon \in (0,\infty )$ and every arbitrarily large $p{\,\in\,} (0,\infty )$ that the considered SGD optimization algorithm converges in the strong $L^p$-sense with order $1/2-\varepsilon $ to the global minimum of the objective function of the considered stochastic optimization problem under standard convexity-type assumptions on the objective function and relaxed assumptions on the moments of the stochastic errors appearing in the employed SGD optimization algorithm. The key ideas in our convergence proof are, first, to employ techniques from the theory of Lyapunov-type functions for dynamical systems to develop a general convergence machinery for SGD optimization algorithms based on such functions, then, to apply this general machinery to concrete Lyapunov-type functions with polynomial structures and, thereafter, to perform an induction argument along the powers appearing in the Lyapunov-type functions in order to achieve for every arbitrarily large $ p \in (0,\infty ) $ strong $ L^p $-convergence rates.
Funder
Swiss National Science Foundation
Publisher
Oxford University Press (OUP)
Subject
Applied Mathematics,Computational Mathematics,General Mathematics
Reference114 articles.
1. Natural gradient works efficiently in learning;Amari;Neural Comput.,1998
2. Adaptive method of realizing natural gradient learning for multilayer perceptrons;Amari;Neural Comput.,2000
3. Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression;Bach;J. Mach. Learn. Res.,2014
4. Non-asymptotic analysis of stochastic approximation algorithms for machine learning Advances in Neural Information Processing Systems 24;Bach,2011
5. Non-strongly-convex smooth stochastic approximation with convergence rate 0(1/n);Bach,2013
Cited by
14 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献