Optimized Dropkey-Based Grad-CAM: Toward Accurate Image Feature Localization
Author:
Liu Yiwei1, Tang Luping12, Liao Chen3, Zhang Chun1, Guo Yingqing1ORCID, Xia Yixuan1, Zhang Yangyang1, Yao Sisi1
Affiliation:
1. College of Mechanical and Electrical Engineering, Nanjing Forestry University, Nanjing 210037, China 2. SEU-FEI Nano-Pico Center, Key Lab of MEMS of Ministry of Education, Southeast University, Nanjing 210096, China 3. College of Electronic and Optical Engineering & College of Flexible Electronics (Future Technology), Nanjing University of Posts and Telecommunications, Nanjing 210023, China
Abstract
Regarding the interpretable techniques in the field of image recognition, Grad-CAM is widely used for feature localization in images to reflect the logical decision-making information behind the neural network due to its high applicability. However, extensive experimentation on a customized dataset revealed that the deep convolutional neural network (CNN) model based on Gradient-weighted Class Activation Mapping (Grad-CAM) technology cannot effectively resist the interference of large-scale noise. In this article, an optimization of the deep CNN model was proposed by incorporating the Dropkey and Dropout (as a comparison) algorithm. Compared with Grad-CAM, the improved Grad-CAM based on Dropkey applies an attention mechanism to the feature map before calculating the gradient, which can introduce randomness and eliminate some areas by applying a mask to the attention score. Experimental results show that the optimized Grad-CAM deep CNN model based on the Dropkey algorithm can effectively resist large-scale noise interference and achieve accurate localization of image features. For instance, under the interference of a noise variance of 0.6, the Dropkey-enhanced ResNet50 model achieves a confidence level of 0.878 in predicting results, while the other two models exhibit confidence levels of 0.766 and 0.481, respectively. Moreover, it exhibits excellent performance in visualizing tasks related to image features such as distortion, low contrast, and small object characteristics. Furthermore, it has promising prospects in practical computer vision applications. For instance, in the field of autonomous driving, it can assist in verifying whether deep learning models accurately understand and process crucial objects, road signs, pedestrians, or other elements in the environment.
Funder
National Natural Science Foundation of China Postdoctoral Science Foundation of China Fundamental Research Funds for the Central Universities Open Research Fund of key Laboratory of MEMS of Ministry of Education, Southeast University Nanjing Forestry University College Student Innovation Training Program NUPTSF
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference25 articles.
1. He, K., Gkioxari, G., Dollár, P., and Girshick, R. (2017, January 22–29). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy. 2. Ren, S., He, K., Girshick, R., and Sun, J. (2015, January 7–12). Faster R-CNN: Towards real-time object detection with region proposal networks. Proceedings of the Advances in Neural Information Processing Systems 28 (NIPS 2015), Montreal, QC, Canada. 3. He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 11–14). Identity Mappings in Deep Residual Networks. Proceedings of the Computer Vision—ECCV 2016—14th European Conference, Amsterdam, The Netherlands. 4. Imagenet classification with deep convolutional neural networks;Krizhevsky;Commun. ACM,2017 5. Dong, Y., Su, H., Wu, B., Li, Z., Liu, W., Zhang, T., and Zhu, J. (2019, January 15–20). Efficient Decision-Based Black-Box Adversarial Attacks on Face Recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|