Non-asymptotic estimates for TUSLA algorithm for non-convex learning with applications to neural networks with ReLU activation function

Author:

Lim Dong-Young1,Neufeld Ariel2,Sabanis Sotirios3,Zhang Ying4

Affiliation:

1. Department of Industrial Engineering , Ulsan National Institute of Science and Technology (UNIST), 112 Enginnering Building, 301-14, Ulsan, South Korea

2. Division of Mathematical Sciences , Nanyang Technological University, 21 Nanyang Link, 637371 Singapore

3. School of Mathematics , The University of Edinburgh, James Clerk Maxwell Building, Peter Guthrie Tait Rd, Edinburgh EH9 3FD, UK; The Alan Turing Institute, 2QR, 96 Euston Rd, London NW1 2DB, UK; and National Technical University of Athens, Athens, 15780, Greece

4. Financial Technology Thrust , Society Hub, The Hong Kong University of Science and Technology (Guangzhou), No. 1 Du Xue Rd, Nansha District, Guangzhou, China; and Division of Mathematical Sciences, Nanyang Technological University, 21 Nanyang Link, 637371 Singapore

Abstract

Abstract We consider nonconvex stochastic optimization problems where the objective functions have super-linearly growing and discontinuous stochastic gradients. In such a setting, we provide a nonasymptotic analysis for the tamed unadjusted stochastic Langevin algorithm (TUSLA) introduced in Lovas et al. (2020). In particular, we establish nonasymptotic error bounds for the TUSLA algorithm in Wasserstein-1 and Wasserstein-2 distances. The latter result enables us to further derive nonasymptotic estimates for the expected excess risk. To illustrate the applicability of the main results, we consider an example from transfer learning with ReLU neural networks, which represents a key paradigm in machine learning. Numerical experiments are presented for the aforementioned example, which support our theoretical findings. Hence, in this setting, we demonstrate both theoretically and numerically that the TUSLA algorithm can solve the optimization problem involving neural networks with ReLU activation function. Besides, we provide simulation results for synthetic examples where popular algorithms, e.g., ADAM, AMSGrad, RMSProp and (vanilla) stochastic gradient descent algorithm, may fail to find the minimizer of the objective functions due to the super-linear growth and the discontinuity of the corresponding stochastic gradient, while the TUSLA algorithm converges rapidly to the optimal solution. Moreover, we provide an empirical comparison of the performance of TUSLA with popular stochastic optimizers on real-world datasets, as well as investigate the effect of the key hyperparameters of TUSLA on its performance.

Publisher

Oxford University Press (OUP)

Subject

Applied Mathematics,Computational Mathematics,General Mathematics

Reference43 articles.

1. A statistical theory of cold posteriors in deep neural networks;Aitchison,2020

2. On stochastic gradient Langevin dynamics with dependent data streams in the logconcave case;Barkhagen;Bernoulli J. Math. Stat. Probab.,2021

3. MOS-SIAM Series on Optimization;Beck,2014

4. The promises and pitfalls of stochastic gradient Langevin dynamics;Brosse,2018

5. The tamed unadjusted Langevin algorithm;Brosse;Stochastic Process. Appl.,2019

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3