For interpolating kernel machines, minimizing the norm of the ERM solution maximizes stability-Reference-Cited by-同舟云学术

For interpolating kernel machines, minimizing the norm of the ERM solution maximizes stability

Published:2022-12-28 Issue:01 Volume:21 Page:193-215
ISSN:0219-5305
Container-title:Analysis and Applications
language:en
Short-container-title:Anal. Appl.

Author:

Rangamani Akshay¹,Rosasco Lorenzo¹²,Poggio Tomaso¹

Affiliation:

1. Center for Brains, Minds and Machines, McGovern Institute for Brain Research, MIT, 43 Vassar St, Cambridge, MA 02139, USA

2. MaLGa, DIBRIS, Universitá di Genova, Italy

Abstract

In this paper, we study kernel ridge-less regression, including the case of interpolating solutions. We prove that maximizing the leave-one-out ([Formula: see text]) stability minimizes the expected error. Further, we also prove that the minimum norm solution — to which gradient algorithms are known to converge — is the most stable solution. More precisely, we show that the minimum norm interpolating solution minimizes a bound on [Formula: see text] stability, which in turn is controlled by the smallest singular value, hence the condition number, of the empirical kernel matrix. These quantities can be characterized in the asymptotic regime where both the dimension ([Formula: see text]) and cardinality ([Formula: see text]) of the data go to infinity (with [Formula: see text] as [Formula: see text]). Our results suggest that the property of [Formula: see text] stability of the learning algorithm with respect to perturbations of the training set may provide a more general framework than the classical theory of Empirical Risk Minimization (ERM). While ERM was developed to deal with the classical regime in which the architecture of the learning network is fixed and [Formula: see text], the modern regime focuses on interpolating regressors and overparameterized models, when both [Formula: see text] and [Formula: see text] go to infinity. Since the stability framework is known to be equivalent to the classical theory in the classical regime, our results here suggest that it may be interesting to extend it beyond kernel regression to other overparameterized algorithms such as deep networks.

Publisher

World Scientific Pub Co Pte Ltd

Subject

Applied Mathematics,Analysis

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0219530522400115

Reference19 articles.

1. A revisitation of formulae for the Moore–Penrose inverse of modified matrices

2. Reconciling modern machine-learning practice and the classical bias–variance trade-off

3. Theory of Classification: a Survey of Some Recent Advances

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Investigating over-parameterized randomized graph networks;Neurocomputing;2024-11

2. Sample complexity bounds for the local convergence of least squares approximation;Analysis and Applications;2024-04-05