Author:
Zhang Zhiqiang,Li Xiaoming,Yang Yihe,Shi Zhiyong
Abstract
Abstract
In the era of big data, efficient data processing has become a crucial issue for scientific development. Image classification, as one of the core tasks in the field of computer vision, holds great significance for achieving automated and intelligent applications. Nonlinear activation functions play a crucial role in neural networks, as they can introduce nonlinear properties and improve the representation and learning ability of the model. Therefore, it is essential to investigate the performance of different nonlinear activation functions on image classification tasks to optimize model performance and improve data processing efficiency. This paper is based on three nonlinear activation functions, namely, the cosine linear unit (CosLU), derivative exponential linear unit (DELU), and rectified linear unit with nonnegative slope (ReLUN), proposed by E. Pishchik in 2023, to study their performance on image classification tasks. We selected two datasets, CIFAR-10 and CIFAR-100, and employed these three activation functions to train five progressively deepening network models. By comparing them with the ReLU activation function and between the two datasets, we expanded the number of classes in the dataset to provide a more comprehensive evaluation of these activation functions. The experimental results demonstrate that when trained on the CIFAR-10 dataset, the cosine linear unit (CosLU) activation function outperforms ReLU, while the derivative exponential linear unit (DELU) activation function exhibits poor performance, and the rectified linear unit with nonnegative slope (ReLUN) activation function performs similarly to ReLU. However, when trained on the CIFAR-100 dataset, the effectiveness of these activation functions significantly decreases. Additionally, we observed that activation functions with trainable parameters tend to exhibit an overall performance trend that improves as the model size increases. Furthermore, we identified a characteristic shared by most activation functions with trainable parameters, indicating that the larger the model is, the better the overall performance trend may become.
Publisher
Research Square Platform LLC
Reference24 articles.
1. Chen, Yinpeng and Dai, Xiyang and Liu, Mengchen and Chen, Dongdong and Yuan, Lu and Liu, Zicheng (2020) Dynamic relu. Springer, 351--367, European Conference on Computer Vision
2. Bishop, Christopher M and Nasrabadi, Nasser M (2006) Pattern recognition and machine learning. Springer, 4, 4
3. Xu, Jin and Li, Zishan and Du, Bowen and Zhang, Miaomiao and Liu, Jing (2020) Reluplex made more practical: Leaky ReLU. IEEE, 1--7, 2020 IEEE Symposium on Computers and communications (ISCC)
4. Wang, Gang and Giannakis, Georgios B and Chen, Jie (2019) Learning ReLU networks on linearly separable data: Algorithm, optimality, and generalization. IEEE Transactions on Signal Processing 67(9): 2357--2370 IEEE
5. Bustamante, Michel and Gianeselli, Luigi (2006) Regles de calcul de la portrance des pieux aux ELU: Methode pressiometrique. ACTELU1, 1, ELU-ULS 2006: SYMPOSIUM INTERNATIONAL SUR LES ETATS LIMITES ULTIMES DES OUVRAGES GEOTECHNIQUES, MARNE-LA-VALLEE, 23-25 AOUT 2006