Flat Minima-Reference-Cited by-同舟云学术

Flat Minima

Published:1997-01-01 Issue:1 Volume:9 Page:1-42
ISSN:0899-7667
Container-title:Neural Computation
language:en
Short-container-title:Neural Computation

Author:

Hochreiter Sepp¹,Schmidhuber Jürgen²

Affiliation:

1. Fakultät für Informatik, Technische Universität München, 80290 München, Germany

2. IDSIA, Corso Elvezia 36, 6900 Lugano, Switzerland

Abstract

We present a new algorithm for finding low-complexity neural networks with high generalization capability. The algorithm searches for a “flat” minimum of the error function. A flat minimum is a large connected region in weight space where the error remains approximately constant. An MDL-based, Bayesian argument suggests that flat minima correspond to “simple” networks and low expected overfitting. The argument is based on a Gibbs algorithm variant and a novel way of splitting generalization error into underfitting and overfitting error. Unlike many previous approaches, ours does not require gaussian assumptions and does not depend on a “good” weight prior. Instead we have a prior over input output functions, thus taking into account net architecture and training set. Although our algorithm requires the computation of second-order derivatives, it has backpropagation's order of complexity. Automatically, it effectively prunes units, weights, and input lines. Various experiments with feedforward and recurrent nets are described. In an application to stock market prediction, flat minimum search outperforms conventional backprop, weight decay, and “optimal brain surgeon/optimal brain damage.”

Publisher

MIT Press - Journals

Subject

Cognitive Neuroscience,Arts and Humanities (miscellaneous)

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/neco.1997.9.1.1

Reference32 articles.

1. Statistical predictor identification

2. Statistical Theory of Learning Curves under Entropic Loss Criterion

3. Dynamic Node Creation in Backpropagation Networks

4. Curvature-driven smoothing: a learning algorithm for feedforward networks

Cited by 303 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. AAontology: An Ontology of Amino Acid Scales for Interpretable Machine Learning;Journal of Molecular Biology;2024-10

2. Boosting sharpness-aware training with dynamic neighborhood;Pattern Recognition;2024-09

3. A Comprehensive Survey of Continual Learning: Theory, Method and Application;IEEE Transactions on Pattern Analysis and Machine Intelligence;2024-08

4. Brain-inspired dual-pathway neural network architecture and its generalization analysis;Science China Technological Sciences;2024-07-30

5. Weight fluctuations in deep linear neural networks and a derivation of the inverse-variance flatness relation;Physical Review Research;2024-07-25