Infinite‐width limit of deep linear neural networks-Reference-Cited by-同舟云学术

Infinite‐width limit of deep linear neural networks

Published:2024-05-06 Issue:10 Volume:77 Page:3958-4007
ISSN:0010-3640
Container-title:Communications on Pure and Applied Mathematics
language:en
Short-container-title:Comm Pure Appl Math

Author:

Chizat Lénaïc¹,Colombo Maria¹,Fernández‐Real Xavier¹,Figalli Alessio²

Affiliation:

1. EPFL SB MATH Institute of Mathematics Lausanne Switzerland

2. Department of Mathematics ETH Zurich Zurich Switzerland

Abstract

AbstractThis paper studies the infinite‐width limit of deep linear neural networks (NNs) initialized with random parameters. We obtain that, when the number of parameters diverges, the training dynamics converge (in a precise sense) to the dynamics obtained from a gradient descent on an infinitely wide deterministic linear NN. Moreover, even if the weights remain random, we get their precise law along the training dynamics, and prove a quantitative convergence result of the linear predictor in terms of the number of parameters. We finally study the continuous‐time limit obtained for infinitely wide linear NNs and show that the linear predictors of the NN converge at an exponential rate to the minimal ‐norm minimizer of the risk.

Funder

Stavros Niarchos Foundation

Agencia Estatal de Investigación

European Research Council

Publisher

Wiley

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpa.22200

Reference45 articles.

1. Representations for partially exchangeable arrays of random variables

2. Z.Allen‐Zhu Y.Li andZ.Song A convergence theory for deep learning via over‐parameterization International Conference on Machine Learning PMLR Long Beach California 2019 pp.242–252.

3. S.Arora N.Cohen N.Golowich andW.Hu A convergence analysis of gradient descent for deep linear neural networks International Conference on Learning Representations 2018.

4. Implicit regularization in deep matrix factorization;Arora S.;Adv. Neural Inf. Process. Syst.,2019

5. F.BachandL.Chizat Gradient descent on infinitely wide neural networks: global convergence and generalization ICM—International Congress of Mathematicians vol.7 sections 15–20 pp.5398–5419(2023). DOI 10.4171/icm2022/121