Stationary Behavior of Constant Stepsize SGD Type Algorithms

Author:

Chen Zaiwei1,Mou Shancong1,Maguluri Siva Theja1

Affiliation:

1. Georgia Institute of Technology, Atlanta, GA, USA

Abstract

Stochastic approximation (SA) and stochastic gradient descent (SGD) algorithms are work-horses for modern machine learning algorithms. Their constant stepsize variants are preferred in practice due to fast convergence behavior. However, constant stepsize SA algorithms do not converge to the optimal solution, but instead have a stationary distribution, which in general cannot be analytically characterized. In this work, we study the asymptotic behavior of the appropriately scaled stationary distribution, in the limit when the constant stepsize goes to zero. Specifically, we consider the following three settings: (1) SGD algorithm with a smooth and strongly convex objective, (2) linear SA algorithm involving a Hurwitz matrix, and (3) nonlinear SA algorithm involving a contractive operator. When the iterate is scaled by 1/α, where α is the constant stepsize, we show that the limiting scaled stationary distribution is a solution of an implicit equation. Under a uniqueness assumption (which can be removed in certain settings) on this equation, we further characterize the limiting distribution as a Gaussian distribution whose covariance matrix is the unique solution of a suitable Lyapunov equation. For SA algorithms beyond these cases, our numerical experiments suggest that unlike central limit theorem type results: (1) the scaling factor need not be 1/α, and (2) the limiting distribution need not be Gaussian. Based on the numerical study, we come up with a heuristic formula to determine the right scaling factor, and make insightful connection to the Euler-Maruyama discretization scheme for approximating stochastic differential equations.

Funder

NSF

Raytheon Technologies

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture,Safety, Risk, Reliability and Quality,Computer Science (miscellaneous)

Reference56 articles.

1. Stefan Banach . 1922. Sur les opérations dans les ensembles abstraits et leur application aux équations intégrales. Fund. math , Vol. 3 , 1 ( 1922 ), 133--181. Stefan Banach. 1922. Sur les opérations dans les ensembles abstraits et leur application aux équations intégrales. Fund. math , Vol. 3, 1 (1922), 133--181.

2. Amir Beck . 2017. First-order methods in optimization . Vol. 25 . SIAM. Amir Beck. 2017. First-order methods in optimization . Vol. 25. SIAM.

3. Albert Benveniste , Michel Métivier , and Pierre Priouret . 2012. Adaptive algorithms and stochastic approximations . Vol. 22 . Springer Science & Business Media . Albert Benveniste, Michel Métivier, and Pierre Priouret. 2012. Adaptive algorithms and stochastic approximations. Vol. 22. Springer Science & Business Media.

4. Dimitri P Bertsekas and John N Tsitsiklis . 1996. Neuro-dynamic programming . Athena Scientific . Dimitri P Bertsekas and John N Tsitsiklis. 1996. Neuro-dynamic programming .Athena Scientific.

5. Jalaj Bhandari , Daniel Russo , and Raghav Singal . 2018 . A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation . In Conference On Learning Theory . 1691--1692 . Jalaj Bhandari, Daniel Russo, and Raghav Singal. 2018. A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation. In Conference On Learning Theory . 1691--1692.

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3