Learning curves for the multi-class teacher–student perceptron-Reference-Cited by-同舟云学术

Learning curves for the multi-class teacher–student perceptron

Published:2023-02-14 Issue:1 Volume:4 Page:015019
ISSN:2632-2153
Container-title:Machine Learning: Science and Technology
language:
Short-container-title:Mach. Learn.: Sci. Technol.

Author:

Cornacchia Elisabetta,Mignacco Francesca^ORCID,Veiga Rodrigo^ORCID,Gerbelot Cédric,Loureiro Bruno,Zdeborová Lenka

Abstract

Abstract One of the most classical results in high-dimensional learning theory provides a closed-form expression for the generalisation error of binary classification with a single-layer teacher–student perceptron on i.i.d. Gaussian inputs. Both Bayes-optimal (BO) estimation and empirical risk minimisation (ERM) were extensively analysed in this setting. At the same time, a considerable part of modern machine learning practice concerns multi-class classification. Yet, an analogous analysis for the multi-class teacher–student perceptron was missing. In this manuscript we fill this gap by deriving and evaluating asymptotic expressions for the BO and ERM generalisation errors in the high-dimensional regime. For Gaussian teacher, we investigate the performance of ERM with both cross-entropy and square losses, and explore the role of ridge regularisation in approaching Bayes-optimality. In particular, we observe that regularised cross-entropy minimisation yields close-to-optimal accuracy. Instead, for Rademacher teacher we show that a first-order phase transition arises in the BO performance.

Funder

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior

H2020 European Research Council

Publisher

IOP Publishing

Subject

Artificial Intelligence,Human-Computer Interaction,Software

Link

https://iopscience.iop.org/article/10.1088/2632-2153/acb428/pdf

Reference46 articles.

1. Three unfinished works on the optimal storage capacity of networks;Gardner;J. Phys. A: Math. Gen.,1989

2. Statistical mechanics of learning from examples;Seung;Phys. Rev. A,1992

3. The statistical mechanics of learning a rule;Watkin;Rev. Mod. Phys.,1993

4. First-order transition to perfect generalization in a neural network with binary synapses;Györgyi;Phys. Rev. A,1990

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Phase transitions in the mini-batch size for sparse and dense two-layer neural networks;Machine Learning: Science and Technology;2024-01-23

2. Neural-prior stochastic block model;Machine Learning: Science and Technology;2023-08-17

3. A Comparative Analysis of Student Grade Prediction and Classification Using Default, Onevsone, and H2o Automl Ensemble Analysis;2023