An analytic theory of shallow networks dynamics for hinge loss classification*-Reference-Cited by-同舟云学术

An analytic theory of shallow networks dynamics for hinge loss classification*

Published:2021-12-01 Issue:12 Volume:2021 Page:124005
ISSN:1742-5468
Container-title:Journal of Statistical Mechanics: Theory and Experiment
language:
Short-container-title:J. Stat. Mech.

Author:

Pellegrini Franco,Biroli Giulio

Abstract

Abstract Neural networks have been shown to perform incredibly well in classification tasks over structured high-dimensional datasets. However, the learning dynamics of such networks is still poorly understood. In this paper we study in detail the training dynamics of a simple type of neural network: a single hidden layer trained to perform a classification task. We show that in a suitable mean-field limit this case maps to a single-node learning problem with a time-dependent dataset determined self-consistently from the average nodes population. We specialize our theory to the prototypical case of a linearly separable data and a linear hinge loss, for which the dynamics can be explicitly solved in the infinite dataset limit. This allows us to address in a simple setting several phenomena appearing in modern networks such as slowing down of training dynamics, crossover between rich and lazy learning, and overfitting. Finally, we assess the limitations of mean-field theory by studying the case of large but finite number of nodes and of training samples.

Publisher

IOP Publishing

Subject

Statistics, Probability and Uncertainty,Statistics and Probability,Statistical and Nonlinear Physics

Link

https://iopscience.iop.org/article/10.1088/1742-5468/ac3a76/pdf

Reference39 articles.

1. Deep learning;LeCun;Nature,2015

2. Universal approximation bounds for superpositions of a sigmoidal function;Barron;IEEE Trans. Inf. Theory,1993

3. Theory of deep learning: III. Explaining the non-overfitting puzzle;Poggio,2017

4. Connecting optimization and regularization paths;Suggala,2018

5. Implicit regularization of discrete gradient dynamics in linear neural networks;Gidel,2019