Author:
Favero Alessandro,Cagnetta Francesco,Wyart Matthieu
Abstract
Abstract
Convolutional neural networks perform a local and translationally-invariant treatment of the data: quantifying which of these two aspects is central to their success remains a challenge. We study this problem within a teacher–student framework for kernel regression, using ‘convolutional’ kernels inspired by the neural tangent kernel of simple convolutional architectures of given filter size. Using heuristic methods from physics, we find in the ridgeless case that locality is key in determining the learning curve exponent β (that relates the test error ϵ
t
∼ P
−β
to the size of the training set P), whereas translational invariance is not. In particular, if the filter size of the teacher t is smaller than that of the student s, β is a function of s only and does not depend on the input dimension. We confirm our predictions on β empirically. We conclude by proving, under a natural universality assumption, that performing kernel regression with a ridge that decreases with the size of the training set leads to similar learning curve exponents to those we obtain in the ridgeless case.
Subject
Statistics, Probability and Uncertainty,Statistics and Probability,Statistical and Nonlinear Physics
Reference41 articles.
1. Distance-based classification with lipschitz functions;von Luxburg;J. Mach. Learn. Res.,2004
2. Deep learning scaling is predictable, empirically;Hestness,2017
3. Imagenet classification with deep convolutional neural networks;Krizhevsky,2012
4. Asymptotic learning curves of kernel methods: empirical data versus teacher–student paradigm;Spigler;J. Stat. Mech.,2020
5. Recognition-by-components: a theory of human image understanding;Biederman;Psychol. Rev.,1987
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献