THE ROLE OF STIFFNESS IN TRAINING AND GENERALIZATION OF RESNETS
-
Published:2023
Issue:2
Volume:4
Page:75-103
-
ISSN:2689-3967
-
Container-title:Journal of Machine Learning for Modeling and Computing
-
language:en
-
Short-container-title:J Mach Learn Model Comput
Author:
Hudson Joshua,D'Elia Marta,Najm Habib N.,Sargsyan Khachik
Abstract
Neural ordinary differential equations (NODEs) have recently regained popularity as large-depth limits of a large class of neural networks. In particular, residual neural networks (ResNets) are equivalent to an explicit Euler discretization of an underlying NODE, where the transition from one layer to the next is one time step of the discretization. The relationship between continuous and discrete neural networks has been of particular interest. Notably, analysis from the ordinary
differential equation viewpoint can potentially lead to new insights for understanding the behavior of neural networks in general. In this work, we take inspiration from differential equations to define the concept of stiffness for a ResNet via the interpretation of a ResNet as the discretization of a NODE.
We then examine the effects of stiffness on the ability of a ResNet to generalize, via computational studies on example problems coming from climate and chemistry models. We find that penalizing stiffness does have a unique regularizing effect, but we see no benefit to penalizing stiffness over L<sup>2</sup> regularization (penalization of network parameter norms) in terms of predictive performance.
Reference28 articles.
1. Balasubramaniam, P., Chandran, R., and Jeeva Sathya Theesar, S., Synchronization of Chaotic Nonlinear Continuous Neural Networks with Time-Varying Delay, Cognitive Neurodyn., vol. 5, no. 4, pp. 361-371, 2011. 2. Blondal, K., Sargsyan, K., Bross, D., Ruscic, B., and Goldsmith, C.F., Configuration Space Integration for Adsorbate Partition Functions: The Effect of Anharmonicity on the Thermophysical Properties of CO-Pt(111) and CH3OH-Cu(111), ACS Catalysis, vol. 13, pp. 19-32, 2022. 3. Chen, R.T.Q., Rubanova, Y., Bettencourt, J., and Duvenaud, D.K., Neural Ordinary Differential Equations, in Advances in Neural Information Processing Systems, Vol. 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds., Red Hook, NY: Curran Associates, 2018. 4. Das, S., On the Synthesis of Nonlinear Continuous Neural Networks, IEEE Transact. Sys., Man, Cybernet., vol. 21, no. 2, pp. 413-418, 1991. 5. Dupont, E., Doucet, A., and Teh, Y.W., Augmented Neural ODEs, in Advances in Neural Information Processing Systems, Vol. 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alchre-Buc, E. Fox, and R. Garnett, Eds., Red Hook, NY: Curran Associates, 2019.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|