Wide and deep neural networks achieve consistency for classification-Reference-Cited by-同舟云学术

Wide and deep neural networks achieve consistency for classification

Published:2023-03-30 Issue:14 Volume:120 Page:
ISSN:0027-8424
Container-title:Proceedings of the National Academy of Sciences
language:en
Short-container-title:Proc. Natl. Acad. Sci. U.S.A.

Author:

Radhakrishnan Adityanarayanan¹²³,Belkin Mikhail⁴⁵,Uhler Caroline¹²³^ORCID

Affiliation:

1. Laboratory for Information & Decision Systems, Massachusetts Institute of Technology, Cambridge, MA 02142

2. Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, MA 02142

3. Broad Institute, Massachusetts Institute of Technology, Cambridge, MA 02142

4. Halicioğlu Data Science Institute, University of California, San Diego, CA 92093

5. Computer Science and Engineering, University of California, San Diego, CA 92093

Abstract

While neural networks are used for classification tasks across domains, a long-standing open problem in machine learning is determining whether neural networks trained using standard procedures are consistent for classification, i.e., whether such models minimize the probability of misclassification for arbitrary data distributions. In this work, we identify and construct an explicit set of neural network classifiers that are consistent. Since effective neural networks in practice are typically both wide and deep, we analyze infinitely wide networks that are also infinitely deep. In particular, using the recent connection between infinitely wide neural networks and neural tangent kernels, we provide explicit activation functions that can be used to construct networks that achieve consistency. Interestingly, these activation functions are simple and easy to implement, yet differ from commonly used activations such as ReLU or sigmoid. More generally, we create a taxonomy of infinitely wide and deep networks and show that these models implement one of three well-known classifiers depending on the activation function used: 1) 1-nearest neighbor (model predictions are given by the label of the nearest training example); 2) majority vote (model predictions are given by the label of the class with the greatest representation in the training set); or 3) singular kernel classifiers (a set of classifiers containing those that achieve consistency). Our results highlight the benefit of using deep networks for classification tasks, in contrast to regression tasks, where excessive depth is harmful.

Funder

National Science Foundation

Simons Foundation

US | USN | Office of Naval Research

MIT-IBM Watson AI Lab

Eric and Wendy Schmidt Center at the Broad Institute

HHS | NIH | National Center for Complementary and Integrative Health

Publisher

Proceedings of the National Academy of Sciences

Subject

Multidisciplinary

Link

https://pnas.org/doi/pdf/10.1073/pnas.2208779120

Reference46 articles.

1. K. He X. Zhang S. Ren J. Sun “Deep residual learning for image recognition” in Computer Vision and Pattern Recognition (IEEE 2016) pp. 770–778.

2. T. Brown et al . “Language models are few-shot learners” in Advances in Neural Information Processing Systems (Curran Associates Red Hook NY 2020) vol. 33 pp. 1877–1901.

3. Highly accurate protein structure prediction for the human proteome

4. A. Christmann I. Steinwart Support Vector Machines (Springer 2008).

5. L. Devroye L. Gyorfi G. Lugosi A Probablistic Theory of Pattern Recognition (Springer Verlag 1996) vol. 31.

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Evaluating Feature Selection Algorithms for Machine Learning-Based Musical Instrument Identification in Monophonic Recordings;Sakarya University Journal of Computer and Information Sciences;2024-08-31

2. Novel Prognostic Methodology of Bootstrap Forest and Hyperbolic Tangent Boosted Neural Network for Aircraft System;Applied Sciences;2024-06-10

3. Inverse Optical Waveguide Analysis: Leveraging Intensity Distributions to Infer Waveguide Structures;IEEE Photonics Technology Letters;2024-05-15

4. Optimizing Image Enhancement: Feature Engineering for Improved Classification in AI-Assisted Artificial Retinas;Sensors;2024-04-23

5. Research on Cost Estimation of Launch Vehicle Based on Grey Neural Network;Lecture Notes in Business Information Processing;2024