Limitations of neural network training due to numerical instability of backpropagation-Reference-Cited by-同舟云学术

Limitations of neural network training due to numerical instability of backpropagation

Published:2024-02 Issue:1 Volume:50 Page:
ISSN:1019-7168
Container-title:Advances in Computational Mathematics
language:en
Short-container-title:Adv Comput Math

Author:

Karner Clemens,Kazeev Vladimir,Petersen Philipp Christian

Abstract

AbstractWe study the training of deep neural networks by gradient descent where floating-point arithmetic is used to compute the gradients. In this framework and under realistic assumptions, we demonstrate that it is highly unlikely to find ReLU neural networks that maintain, in the course of training with gradient descent, superlinearly many affine pieces with respect to their number of layers. In virtually all approximation theoretical arguments which yield high order polynomial rates of approximation, sequences of ReLU neural networks with exponentially many affine pieces compared to their numbers of layers are used. As a consequence, we conclude that approximating sequences of ReLU neural networks resulting from gradient descent in practice differ substantially from theoretically constructed sequences. The assumptions and the theoretical results are compared to a numerical study, which yields concurring results.

Funder

Austrian Science Fund

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s10444-024-10106-x.pdf

Reference49 articles.

1. Arridge, S., Maass, P., Öktem, O., Schönlieb, C.-B.: Solving inverse problems using data-driven models. Acta Numer. 28, 1–174 (2019)

2. Bachmayr, M., Kazeev, V.: Stability and preconditioning of elliptic PDEs with low-rank multilevel structure. Found. Comput. Math. 20, 1175–1236 (2020)

3. Bhattacharya, K., Hosseini, B., Kovachki, N.B., Stuart, A.M.: Model reduction and neural networks for parametric PDEs. arXiv:2005.03180 (2020)

4. Boche, H., Fono, A., Kutyniok, G.: Limitations of deep learning for inverse problems on digital hardware. arXiv:2202.13490 (2022)

5. Bölcskei, H., Grohs, P., Kutyniok, G., Petersen, P.C.: Optimal approximation with sparsely connected deep neural networks. SIAM J. Math. Data Sci. 1, 8–45 (2019)