Convergence of gradient descent for learning linear neural networks-Reference-Cited by-同舟云学术

Convergence of gradient descent for learning linear neural networks

Published:2024-07-18 Issue:1 Volume:2024 Page:
ISSN:2731-4235
Container-title:Advances in Continuous and Discrete Models
language:en
Short-container-title:Adv Cont Discr Mod

Author:

Nguegnang Gabin Maxime^ORCID,Rauhut Holger^ORCID,Terstiege Ulrich

Abstract

AbstractWe study the convergence properties of gradient descent for training deep linear neural networks, i.e., deep matrix factorizations, by extending a previous analysis for the related gradient flow. We show that under suitable conditions on the stepsizes gradient descent converges to a critical point of the loss function, i.e., the square loss in this article. Furthermore, we demonstrate that for almost all initializations gradient descent converges to a global minimum in the case of two layers. In the case of three or more layers, we show that gradient descent converges to a global minimum on the manifold matrices of some fixed rank, where the rank cannot be determined a priori.

Funder

DAAD

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1186/s13662-023-03797-x.pdf

Reference31 articles.

1. Absil, P.-A., Mahony, R., Andrews, B.: Convergence of the iterates of descent methods for analytic cost functions. SIAM J. Optim. 16(2), 531–547 (2005)

2. Arora, S., Cohen, N., Golowich, N., Hu, W.: A convergence analysis of gradient descent for deep linear neural networks. In: International Conference on Learning Representations (2019)

3. Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: implicit acceleration by overparameterization. In: International Conference on Machine Learning (2018)

4. Arora, S., Cohen, N., Hu, W., Luo, Y.: Implicit regularization in deep matrix factorization. In: Advances in Neural Information Processing Systems, pp. 7413–7424 (2019)

5. Bah, B., Rauhut, H., Terstiege, U., Westdickenberg, M.: Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers. Inf. Inference 11(1), 307–353, (2022).