Power Function Error Initialization Can Improve Convergence of Backpropagation Learning in Neural Networks for Classification-Reference-Cited by-同舟云学术

Power Function Error Initialization Can Improve Convergence of Backpropagation Learning in Neural Networks for Classification

Published:2021-07-26 Issue:8 Volume:33 Page:2193-2225
ISSN:0899-7667
Container-title:Neural Computation
language:en
Short-container-title:

Author:

Knoblauch Andreas¹

Affiliation:

1. Albstadt-Sigmaringen University, Albstadt 72458, Germany knoblauch@hs-albsig.de

Abstract

Supervised learning corresponds to minimizing a loss or cost function expressing the differences between model predictions yn and the target values tn given by the training data. In neural networks, this means backpropagating error signals through the transposed weight matrixes from the output layer toward the input layer. For this, error signals in the output layer are typically initialized by the difference yn- tn, which is optimal for several commonly used loss functions like cross-entropy or sum of squared errors. Here I evaluate a more general error initialization method using power functions |yn- tn|q for q>0, corresponding to a new family of loss functions that generalize cross-entropy. Surprisingly, experiments on various learning tasks reveal that a proper choice of q can significantly improve the speed and convergence of backpropagation learning, in particular in deep and recurrent neural networks. The results suggest two main reasons for the observed improvements. First, compared to cross-entropy, the new loss functions provide better fits to the distribution of error signals in the output layer and therefore maximize the model's likelihood more efficiently. Second, the new error initialization procedure may often provide a better gradient-to-loss ratio over a broad range of neural output activity, thereby avoiding flat loss landscapes with vanishing gradients.

Publisher

MIT Press - Journals

Subject

Cognitive Neuroscience,Arts and Humanities (miscellaneous)

Link

http://direct.mit.edu/neco/article-pdf/33/8/2193/1930906/neco_a_01407.pdf

Reference51 articles.

1. Tensorflow: A system for large-scale machine learning.;Abadi,2016

2. Deep learning;Bengio;Nature,2015

3. Learning long-term dependencies with gradient descent is difficult;Bengio;IEEE Transactions on Neural Networks,1994

4. Anatomy of the Cortex

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Leader learning loss function in neural network classification;Neurocomputing;2023-11

2. Hyper-flexible Convolutional Neural Networks based on Generalized Lehmer and Power Means;Neural Networks;2022-11

3. Research on Expert System of Japanese Auxiliary Teaching Based on BP Neural Network;Mobile Information Systems;2022-05-31

4. On the antiderivatives of xp/(1 − x) with an application to optimize loss functions for classification with neural networks;Annals of Mathematics and Artificial Intelligence;2022-03-18

5. Adapting Loss Functions to Learning Progress Improves Accuracy of Classification in Neural Networks;Lecture Notes in Computer Science;2022