Affiliation:
1. EPFL SB MATH Institute of Mathematics Lausanne Switzerland
2. Department of Mathematics ETH Zurich Zurich Switzerland
Abstract
AbstractThis paper studies the infinite‐width limit of deep linear neural networks (NNs) initialized with random parameters. We obtain that, when the number of parameters diverges, the training dynamics converge (in a precise sense) to the dynamics obtained from a gradient descent on an infinitely wide deterministic linear NN. Moreover, even if the weights remain random, we get their precise law along the training dynamics, and prove a quantitative convergence result of the linear predictor in terms of the number of parameters. We finally study the continuous‐time limit obtained for infinitely wide linear NNs and show that the linear predictors of the NN converge at an exponential rate to the minimal ‐norm minimizer of the risk.
Funder
Stavros Niarchos Foundation
Agencia Estatal de Investigación
European Research Council
Reference45 articles.
1. Representations for partially exchangeable arrays of random variables
2. Z.Allen‐Zhu Y.Li andZ.Song A convergence theory for deep learning via over‐parameterization International Conference on Machine Learning PMLR Long Beach California 2019 pp.242–252.
3. S.Arora N.Cohen N.Golowich andW.Hu A convergence analysis of gradient descent for deep linear neural networks International Conference on Learning Representations 2018.
4. Implicit regularization in deep matrix factorization;Arora S.;Adv. Neural Inf. Process. Syst.,2019
5. F.BachandL.Chizat Gradient descent on infinitely wide neural networks: global convergence and generalization ICM—International Congress of Mathematicians vol.7 sections 15–20 pp.5398–5419(2023). DOI 10.4171/icm2022/121