Abstract
AbstractMultiple hidden layers in deep neural networks perform non-linear transformations, enabling the extraction of meaningful features and the identification of relationships between input and output data. However, the gap between the training and real-world data can result in network overfitting, prompting the exploration of various preventive methods. The regularization technique called ’dropout’ is widely used for deep learning models to improve the training of robust and generalized features. During the training phase with dropout, neurons in a particular layer are randomly selected to be ignored for each input. This random exclusion of neurons encourages the network to depend on different subsets of neurons at different times, fostering robustness and reducing sensitivity to specific neurons. This study introduces a novel approach called random focusing, departing from complete neuron exclusion in dropout. The proposed random focusing selectively highlights random neurons during training, aiming for a smoother transition between training and inference phases while keeping network architecture consistent. This study also incorporates Jensen–Shannon Divergence to enhance the stability and efficacy of the random focusing method. Experimental validation across tasks like image classification and semantic segmentation demonstrates the adaptability of the proposed methods across different network architectures, including convolutional neural networks and transformers.
Funder
Japan Society for the Promotion of Science
Publisher
Springer Science and Business Media LLC
Reference41 articles.
1. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
2. Ahmed ST, Danouchi K, Münch C, Prenat G, Anghel L, Tahoori MB (2023) Spindrop: dropout-based Bayesian binary neural networks with spintronic implementation. IEEE J Emerging Sel Top Circuits Syst 13(1):150–164
3. Liu Y, Matsoukas C, Strand F, Azizpour H, Smith K (2023) Patchdropout: economizing vision transformers using patch dropout. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV), pp 3953–3962 (2023)
4. Fuglede B, Topsoe F (2004) Jensen–Shannon divergence and Hilbert space embedding. In: International symposium on information theory. ISIT 2004. Proceedings. IEEE, p 31
5. Labach A, Salehinejad H, Valaee S (2019) Survey of dropout methods for deep neural networks. arXiv preprint arXiv:1904.13310