Affiliation:
1. School of Computer Science, University of St Andrews, St Andrews KY16 9SS, UK
2. Institute of Information Science and Technologies, Italian National Research Council (CNR), 56124 Pisa, Italy
Abstract
Cross-entropy loss is crucial in training many deep neural networks. In this context, we show a number of novel and strong correlations among various related divergence functions. In particular, we demonstrate that, in some circumstances, (a) cross-entropy is almost perfectly correlated with the little-known triangular divergence, and (b) cross-entropy is strongly correlated with the Euclidean distance over the logits from which the softmax is derived. The consequences of these observations are as follows. First, triangular divergence may be used as a cheaper alternative to cross-entropy. Second, logits can be used as features in a Euclidean space which is strongly synergistic with the classification process. This justifies the use of Euclidean distance over logits as a measure of similarity, in cases where the network is trained using softmax and cross-entropy. We establish these correlations via empirical observation, supported by a mathematical explanation encompassing a number of strongly related divergence functions.
Reference19 articles.
1. Agarwala, A., Pennington, J., Dauphin, Y., and Schoenholz, S. (2020). Temperature check: Theory and practice for training models with softmax-cross-entropy losses. arXiv.
2. DeSa, V.R. (December, January 29). Learning classification with unlabeled data. Proceedings of the 6th International Conference on Neural Information Processing Systems (NIPS’93), San Francisco, CA, USA.
3. Aggarwal, C.C. (2018). Neural Networks and Deep Learning, Springer.
4. Some inequalities for information divergence and related measures of discrimination;Topsoe;IEEE Trans. Inf. Theory,2000
5. Inequalities between entropy and index of coincidence derived from information diagrams;Harremoes;IEEE Trans. Inf. Theory,2001
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献