Two-layer neural network on infinite-dimensional data: global optimization guarantee in the mean-field regime <sup>*</sup>-Reference-Cited by-同舟云学术

Two-layer neural network on infinite-dimensional data: global optimization guarantee in the mean-field regime ^*

Published:2023-11-01 Issue:11 Volume:2023 Page:114007
ISSN:1742-5468
Container-title:Journal of Statistical Mechanics: Theory and Experiment
language:
Short-container-title:J. Stat. Mech.

Author:

Nishikawa Naoki,Suzuki Taiji,Nitanda Atsushi,Wu Denny

Abstract

Abstract The analysis of neural network optimization in the mean-field regime is important as the setting allows for feature learning. The existing theory has been developed mainly for neural networks in finite dimensions, i.e. each neuron has a finite-dimensional parameter. However, the setting of infinite-dimensional input naturally arises in machine learning problems such as nonparametric functional data analysis and graph classification. In this paper, we develop a new mean-field analysis of a two-layer neural network in an infinite-dimensional parameter space. We first give a generalization error bound, which shows that the regularized empirical risk minimizer properly generalizes when the data size is sufficiently large, despite the neurons being infinite-dimensional. Next, we present two gradient-based optimization algorithms for infinite-dimensional mean-field networks, by extending the recently developed particle optimization framework to the infinite-dimensional setting. We show that the proposed algorithms converge to the (regularized) global optimal solution, and moreover, their rates of convergence are of polynomial order in the online setting and exponential order in the finite sample setting, respectively. To the best of our knowledge, this is the first quantitative global optimization guarantee of a neural network on infinite-dimensional input and in the presence of feature learning.

Publisher

IOP Publishing

Subject

Statistics, Probability and Uncertainty,Statistics and Probability,Statistical and Nonlinear Physics

Link

https://iopscience.iop.org/article/10.1088/1742-5468/ad01b2/pdf

Reference36 articles.

1. Information-theoretic lower bounds on the oracle complexity of convex optimization;Agarwal,2009

2. Approximation of the invariant measure with an Euler scheme for stochastic PDEs driven by space-time white noise;Bréhier;Potential Anal.,2014

3. Optimal rates for the regularized least-squares algorithm;Caponnetto;Found. Comput. Math.,2007

4. A generalized neural tangent kernel analysis for two-layer neural networks;Chen,2020

5. Mean-field langevin dynamics: exponential convergence and annealing;Chizat,2022