The effect of activation functions on accuracy, convergence speed, and misclassification confidence in CNN text classification: a comprehensive exploration-Reference-Cited by-同舟云学术

The effect of activation functions on accuracy, convergence speed, and misclassification confidence in CNN text classification: a comprehensive exploration

Published:2023-06-24 Issue:1 Volume:80 Page:292-312
ISSN:0920-8542
Container-title:The Journal of Supercomputing
language:en
Short-container-title:J Supercomput

Author:

Emanuel Rebecca H. K.,Docherty Paul D.,Lunt Helen,Möller Knut

Abstract

AbstractConvolutional neural networks (CNNs) have become a useful tool for a wide range of applications such as text classification. However, CNNs are not always sufficiently accurate to be useful in certain applications. The selection of activation functions within CNN architecture can affect the efficacy of the CNN. However, there is limited research regarding which activation functions are best for CNN text classification. This study tested sixteen activation functions across three text classification datasets and six CNN structures, to determine the effects of activation function on accuracy, iterations to convergence, and Positive Confidence Difference (PCD). PCD is a novel metric introduced to compare how activation functions affected a network’s classification confidence. Tables were presented to compare the performance of the activation functions across the different CNN architectures and datasets. Top performing activation functions across the different tests included the symmetrical multi-state activation function, sigmoid, penalised hyperbolic tangent, and generalised swish. An activation function’s PCD was the most consistent evaluation metric during activation function assessment, implying a close relationship between activation functions and network confidence that has yet to be explored.

Funder

College of Engineering, University of Canterbury

University of Canterbury

Publisher

Springer Science and Business Media LLC

Subject

Hardware and Architecture,Information Systems,Theoretical Computer Science,Software

Link

https://link.springer.com/content/pdf/10.1007/s11227-023-05441-7.pdf

Reference61 articles.

1. Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J (2020) Deep learning based text classification: a comprehensive review. arXiv:2004.03705

2. Zhang Q, Wang Y, Gong Y, Huang X (2016) Automatic keyphrase extraction using recurrent neural networks. In: Proceedings of the 2016 Conference On Empirical Methods In Natural Language Processing, Austin, Texas, 2016: Association for Computational Linguistics, https://doi.org/10.18653/v1/D16-1080https://www.aclweb.org/anthology/D16-1080