Wide neural networks of any depth evolve as linear models under gradient descent *


Lee Jaehoon,Xiao Lechao,Schoenholz Samuel S,Bahri Yasaman,Novak Roman,Sohl-Dickstein Jascha,Pennington Jeffrey


Abstract A longstanding goal in deep learning research has been to precisely characterize training and generalization. However, the often complex loss landscapes of neural networks (NNs) have made a theory of learning dynamics elusive. In this work, we show that for wide NNs the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters. Furthermore, mirroring the correspondence between wide Bayesian NNs and Gaussian processes (GPs), gradient-based training of wide NNs with a squared loss produces test set predictions drawn from a GP with a particular compositional kernel. While these theoretical results are only exact in the infinite width limit, we nevertheless find excellent empirical agreement between the predictions of the original network and those of the linearized version even for finite practically-sized networks. This agreement is robust across different architectures, optimization methods, and loss functions.


IOP Publishing


Statistics, Probability and Uncertainty,Statistics and Probability,Statistical and Nonlinear Physics

Reference48 articles.

1. Tensorflow: a system for large-scale machine learning;Abadi,2016

2. On the convergence rate of training recurrent neural networks;Allen-Zhu,2018

3. A convergence theory for deep learning via over-parameterization;Allen-Zhu,2019

4. Dynamical isometry and a mean field theory of RNNs: gating enables signal propagation in recurrent neural networks;Chen,2018

5. On the global convergence of gradient descent for over-parameterized models using optimal transport;Chizat,2018

Cited by 108 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献








Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3