A Random Focusing Method with Jensen–Shannon Divergence for Improving Deep Neural Network Performance Ensuring Architecture Consistency-Reference-Cited by-同舟云学术

A Random Focusing Method with Jensen–Shannon Divergence for Improving Deep Neural Network Performance Ensuring Architecture Consistency

Published:2024-06-17 Issue:4 Volume:56 Page:
ISSN:1573-773X
Container-title:Neural Processing Letters
language:en
Short-container-title:Neural Process Lett

Author:

Kim Wonjik

Abstract

AbstractMultiple hidden layers in deep neural networks perform non-linear transformations, enabling the extraction of meaningful features and the identification of relationships between input and output data. However, the gap between the training and real-world data can result in network overfitting, prompting the exploration of various preventive methods. The regularization technique called ’dropout’ is widely used for deep learning models to improve the training of robust and generalized features. During the training phase with dropout, neurons in a particular layer are randomly selected to be ignored for each input. This random exclusion of neurons encourages the network to depend on different subsets of neurons at different times, fostering robustness and reducing sensitivity to specific neurons. This study introduces a novel approach called random focusing, departing from complete neuron exclusion in dropout. The proposed random focusing selectively highlights random neurons during training, aiming for a smoother transition between training and inference phases while keeping network architecture consistent. This study also incorporates Jensen–Shannon Divergence to enhance the stability and efficacy of the random focusing method. Experimental validation across tasks like image classification and semantic segmentation demonstrates the adaptability of the proposed methods across different network architectures, including convolutional neural networks and transformers.

Funder

Japan Society for the Promotion of Science

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s11063-024-11668-z.pdf

Reference41 articles.

1. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

2. Ahmed ST, Danouchi K, Münch C, Prenat G, Anghel L, Tahoori MB (2023) Spindrop: dropout-based Bayesian binary neural networks with spintronic implementation. IEEE J Emerging Sel Top Circuits Syst 13(1):150–164

3. Liu Y, Matsoukas C, Strand F, Azizpour H, Smith K (2023) Patchdropout: economizing vision transformers using patch dropout. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV), pp 3953–3962 (2023)

4. Fuglede B, Topsoe F (2004) Jensen–Shannon divergence and Hilbert space embedding. In: International symposium on information theory. ISIT 2004. Proceedings. IEEE, p 31

5. Labach A, Salehinejad H, Valaee S (2019) Survey of dropout methods for deep neural networks. arXiv preprint arXiv:1904.13310