Statistical Theory of Learning Curves under Entropic Loss Criterion-Reference-Cited by-同舟云学术

Statistical Theory of Learning Curves under Entropic Loss Criterion

Published:1993-01 Issue:1 Volume:5 Page:140-153
ISSN:0899-7667
Container-title:Neural Computation
language:en
Short-container-title:Neural Computation

Author:

Amari Shun-ichi¹,Murata Noboru¹

Affiliation:

1. Department of Mathematical Engineering and Information Physics, University of Tokyo, Bunkyo-ku, Tokyo 113, Japan

Abstract

The present paper elucidates a universal property of learning curves, which shows how the generalization error, training error, and the complexity of the underlying stochastic machine are related and how the behavior of a stochastic machine is improved as the number of training examples increases. The error is measured by the entropic loss. It is proved that the generalization error converges to H0, the entropy of the conditional distribution of the true machine, as H0 + m*/(2t), while the training error converges as H0 - m*/(2t), where t is the number of examples and m* shows the complexity of the network. When the model is faithful, implying that the true machine is in the model, m* is reduced to m, the number of modifiable parameters. This is a universal law because it holds for any regular machine irrespective of its structure under the maximum likelihood estimator. Similar relations are obtained for the Bayes and Gibbs learning algorithms. These learning curves show the relation among the accuracy of learning, the complexity of a model, and the number of training examples.

Publisher

MIT Press - Journals

Subject

Cognitive Neuroscience,Arts and Humanities (miscellaneous)

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/neco.1993.5.1.140

Reference9 articles.

1. A new look at the statistical model identification

2. Dualistic geometry of the manifold of higher-order neurons

3. Four Types of Learning Curves

4. What Size Net Gives Valid Generalization?

5. Learning from Examples in a Single-Layer Neural Network

Cited by 83 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Sequential Prediction;Learning with the Minimum Description Length Principle;2023

2. Recent advances in algebraic geometry and Bayesian statistics;Information Geometry;2022-12-06

3. Developmental and evolutionary constraints on olfactory circuit selection;Proceedings of the National Academy of Sciences;2022-03-09

4. The Shape of Learning Curves: A Review;IEEE Transactions on Pattern Analysis and Machine Intelligence;2022

5. Accurate and flexible neural-network interatomic potential for mixed materials: TixZr1−xO2 from bulk to clusters and nanoparticles;Physical Review Materials;2021-06-29