Strong error analysis for stochastic gradient descent optimization algorithms-Reference-Cited by-同舟云学术

Strong error analysis for stochastic gradient descent optimization algorithms

Published:2020-05-20 Issue:1 Volume:41 Page:455-492
ISSN:0272-4979
Container-title:IMA Journal of Numerical Analysis
language:en
Short-container-title:

Author:

Jentzen Arnulf¹,Kuckuck Benno¹,Neufeld Ariel²,von Wurstemberger Philippe³

Affiliation:

1. Faculty of Mathematics and Computer Science, University of Münster 48149 Münster, Germany

2. Division of Mathematical Sciences, NTU Singapore Singapore 637371, Singapore

3. Department of Mathematics, ETH Zurich 8092 Zurich, Switzerland

Abstract

Abstract Stochastic gradient descent (SGD) optimization algorithms are key ingredients in a series of machine learning applications. In this article we perform a rigorous strong error analysis for SGD optimization algorithms. In particular, we prove for every arbitrarily small $\varepsilon \in (0,\infty )$ and every arbitrarily large $p{\,\in\,} (0,\infty )$ that the considered SGD optimization algorithm converges in the strong $L^p$-sense with order $1/2-\varepsilon $ to the global minimum of the objective function of the considered stochastic optimization problem under standard convexity-type assumptions on the objective function and relaxed assumptions on the moments of the stochastic errors appearing in the employed SGD optimization algorithm. The key ideas in our convergence proof are, first, to employ techniques from the theory of Lyapunov-type functions for dynamical systems to develop a general convergence machinery for SGD optimization algorithms based on such functions, then, to apply this general machinery to concrete Lyapunov-type functions with polynomial structures and, thereafter, to perform an induction argument along the powers appearing in the Lyapunov-type functions in order to achieve for every arbitrarily large $ p \in (0,\infty ) $ strong $ L^p $-convergence rates.

Funder

Swiss National Science Foundation

Publisher

Oxford University Press (OUP)

Subject

Applied Mathematics,Computational Mathematics,General Mathematics

Link

http://academic.oup.com/imajna/article-pdf/41/1/455/35970895/drz055.pdf

Reference114 articles.

1. Natural gradient works efficiently in learning;Amari;Neural Comput.,1998

2. Adaptive method of realizing natural gradient learning for multilayer perceptrons;Amari;Neural Comput.,2000

3. Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression;Bach;J. Mach. Learn. Res.,2014