Exact solutions of a deep linear network
*
-
Published:2023-11-01
Issue:11
Volume:2023
Page:114006
-
ISSN:1742-5468
-
Container-title:Journal of Statistical Mechanics: Theory and Experiment
-
language:
-
Short-container-title:J. Stat. Mech.
Author:
Ziyin Liu,Li Botao,Meng Xiangming
Abstract
Abstract
This work finds the analytical expression for the global minima of a deep linear network with weight decay and stochastic neurons, a fundamental model for understanding the landscape of neural networks. Our result implies that the origin is a special point in the deep neural network loss landscape where highly nonlinear phenomenon emerge. We show that weight decay strongly interacts with the model architecture and can create bad minima at zero in a network with more than one hidden layer, qualitatively different from a network with only one hidden layer. Practically, our result implies that common deep learning initialization methods are generally insufficient to ease the optimization of neural networks.
Subject
Statistics, Probability and Uncertainty,Statistics and Probability,Statistical and Nonlinear Physics
Reference43 articles.
1. Fixing a broken ELBO;Alemi,2018
2. Neural networks and principal component analysis: learning from examples without local minima;Baldi;Neural Netw.,1989
3. Dropout as a low-rank regularizer for matrix factorization;Cavazza,2018
4. The loss surfaces of multilayer networks;Choromanska,2015a
5. Open problem: the landscape of the loss surfaces of multilayer networks;Choromanska,2015b