Archetypal landscapes for deep neural networks-Reference-Cited by-同舟云学术

Archetypal landscapes for deep neural networks

Published:2020-08-25 Issue:36 Volume:117 Page:21857-21864
ISSN:0027-8424
Container-title:Proceedings of the National Academy of Sciences
language:en
Short-container-title:Proc Natl Acad Sci USA

Author:

Verpoort Philipp C.^ORCID,Lee Alpha A.,Wales David J.^ORCID

Abstract

The predictive capabilities of deep neural networks (DNNs) continue to evolve to increasingly impressive levels. However, it is still unclear how training procedures for DNNs succeed in finding parameters that produce good results for such high-dimensional and nonconvex loss functions. In particular, we wish to understand why simple optimization schemes, such as stochastic gradient descent, do not end up trapped in local minima with high loss values that would not yield useful predictions. We explain the optimizability of DNNs by characterizing the local minima and transition states of the loss-function landscape (LFL) along with their connectivity. We show that the LFL of a DNN in the shallow network or data-abundant limit is funneled, and thus easy to optimize. Crucially, in the opposite low-data/deep limit, although the number of minima increases, the landscape is characterized by many minima with similar loss values separated by low barriers. This organization is different from the hierarchical landscapes of structural glass formers and explains why minimization procedures commonly employed by the machine-learning community can navigate the LFL successfully and reach low-lying solutions.

Publisher

Proceedings of the National Academy of Sciences

Subject

Multidisciplinary

Reference63 articles.

1. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

2. A mean field view of the landscape of two-layer neural networks;Song;Proc. Natl. Acad. Sci. U.S.A.,2018

3. The loss surfaces of multilayer networks;Choromanska,2015

4. S. Hochreiter , J. Schmidhuber , “Simplifying neural nets by discovering flat minima” in NIPS’94: Proceedings of the 7th International Conference on Neural Information Processing Systems, G. Tesauro , D. S. Touretzky , T. K. Leen , Eds. (MIT Press, Cambridge, MA, 1995), pp. 529–536.

5. Flat Minima

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Explainable Gaussian processes: a loss landscape perspective;Machine Learning: Science and Technology;2024-07-23

2. Insights into machine learning models from chemical physics: an energy landscapes approach (EL for ML);Digital Discovery;2024

3. Microscopic image recognition of diatoms based on deep learning;Journal of Phycology;2023-11-23

4. Exploring Gradient Oscillation in Deep Neural Network Training;2023 59th Annual Allerton Conference on Communication, Control, and Computing (Allerton);2023-09-26

5. Data efficiency and extrapolation trends in neural network interatomic potentials;Machine Learning: Science and Technology;2023-08-25