Suboptimal Local Minima Exist for Wide Neural Networks with Smooth Activations-Reference-Cited by-同舟云学术

Suboptimal Local Minima Exist for Wide Neural Networks with Smooth Activations

Published:2022-11 Issue:4 Volume:47 Page:2784-2814
ISSN:0364-765X
Container-title:Mathematics of Operations Research
language:en
Short-container-title:Mathematics of OR

Author:

Ding Tian¹^ORCID,Li Dawei²^ORCID,Sun Ruoyu²^ORCID

Affiliation:

1. Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong;

2. Department of Industrial and Enterprise Systems Engineering and Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801

Abstract

Does a large width eliminate all suboptimal local minima for neural nets? An affirmative answer was given by a classic result published in 1995 for one-hidden-layer wide neural nets with a sigmoid activation function, but this result has not been extended to the multilayer case. Recently, it was shown that, with piecewise linear activations, suboptimal local minima exist even for wide nets. Given the classic positive result on smooth activation and the negative result on nonsmooth activations, an interesting open question is: Does a large width eliminate all suboptimal local minima for deep neural nets with smooth activation? In this paper, we give a largely negative answer to this question. Specifically, we prove that, for neural networks with generic input data and smooth nonlinear activation functions, suboptimal local minima can exist no matter how wide the network is (as long as the last hidden layer has at least two neurons). Therefore, the classic result of no suboptimal local minimum for a one-hidden-layer network does not hold. Whereas this classic result assumes sigmoid activation, our counterexample covers a large set of activation functions (dense in the set of continuous functions), indicating that the limitation is not a result of the specific activation. Together with recent progress on piecewise linear activations, our result indicates that suboptimal local minima are common for wide neural nets.

Publisher

Institute for Operations Research and the Management Sciences (INFORMS)

Subject

Management Science and Operations Research,Computer Science Applications,General Mathematics

Link

https://pubsonline.informs.org/doi/pdf/10.1287/moor.2021.1228

Reference14 articles.

1. Optimal learning in artificial neural networks: A review of theoretical results

2. Jamming transition as a paradigm to understand the loss landscape of deep neural networks

3. A mean field view of the landscape of two-layer neural networks

4. Множество нулей вещественной аналитической функции

5. Toward Moderate Overparameterization: Global Convergence Guarantees for Training Shallow Neural Networks

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Data-driven analysis and prediction of stable phases for high-entropy alloy design;Scientific Reports;2023-12-18

2. Certifying the Absence of Spurious Local Minima at Infinity;SIAM Journal on Optimization;2023-07-13