Deep learning: a statistical viewpoint-Reference-Cited by-同舟云学术

Deep learning: a statistical viewpoint

Published:2021-05 Issue: Volume:30 Page:87-201
ISSN:0962-4929
Container-title:Acta Numerica
language:en
Short-container-title:Acta Numerica

Author:

Bartlett Peter L.,Montanari Andrea,Rakhlin Alexander

Abstract

The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting, that is, accurate predictions despite overfitting training data. In this article, we survey recent progress in statistical learning theory that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behaviour of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favourable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.

Publisher

Cambridge University Press (CUP)

Subject

General Mathematics,Numerical Analysis

Reference167 articles.

1. Sharper Bounds for Gaussian and Empirical Processes

Cited by 74 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Real-time data visual monitoring of triboelectric nanogenerators enabled by Deep learning;Nano Energy;2024-11

2. Out-of-distributional risk bounds for neural operators with applications to the Helmholtz equation;Journal of Computational Physics;2024-09

3. Learning Time-Scales in Two-Layers Neural Networks;Foundations of Computational Mathematics;2024-08-22

4. Leveraging small-scale datasets for additive manufacturing process modeling and part certification: Current practice and remaining gaps;Journal of Manufacturing Systems;2024-08

5. Multitask methods for predicting molecular properties from heterogeneous data;The Journal of Chemical Physics;2024-07-03