Optimized Dropkey-Based Grad-CAM: Toward Accurate Image Feature Localization-Reference-Cited by-同舟云学术

Optimized Dropkey-Based Grad-CAM: Toward Accurate Image Feature Localization

Published:2023-10-10 Issue:20 Volume:23 Page:8351
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Liu Yiwei¹,Tang Luping¹²,Liao Chen³,Zhang Chun¹,Guo Yingqing¹^ORCID,Xia Yixuan¹,Zhang Yangyang¹,Yao Sisi¹

Affiliation:

1. College of Mechanical and Electrical Engineering, Nanjing Forestry University, Nanjing 210037, China

2. SEU-FEI Nano-Pico Center, Key Lab of MEMS of Ministry of Education, Southeast University, Nanjing 210096, China

3. College of Electronic and Optical Engineering & College of Flexible Electronics (Future Technology), Nanjing University of Posts and Telecommunications, Nanjing 210023, China

Abstract

Regarding the interpretable techniques in the field of image recognition, Grad-CAM is widely used for feature localization in images to reflect the logical decision-making information behind the neural network due to its high applicability. However, extensive experimentation on a customized dataset revealed that the deep convolutional neural network (CNN) model based on Gradient-weighted Class Activation Mapping (Grad-CAM) technology cannot effectively resist the interference of large-scale noise. In this article, an optimization of the deep CNN model was proposed by incorporating the Dropkey and Dropout (as a comparison) algorithm. Compared with Grad-CAM, the improved Grad-CAM based on Dropkey applies an attention mechanism to the feature map before calculating the gradient, which can introduce randomness and eliminate some areas by applying a mask to the attention score. Experimental results show that the optimized Grad-CAM deep CNN model based on the Dropkey algorithm can effectively resist large-scale noise interference and achieve accurate localization of image features. For instance, under the interference of a noise variance of 0.6, the Dropkey-enhanced ResNet50 model achieves a confidence level of 0.878 in predicting results, while the other two models exhibit confidence levels of 0.766 and 0.481, respectively. Moreover, it exhibits excellent performance in visualizing tasks related to image features such as distortion, low contrast, and small object characteristics. Furthermore, it has promising prospects in practical computer vision applications. For instance, in the field of autonomous driving, it can assist in verifying whether deep learning models accurately understand and process crucial objects, road signs, pedestrians, or other elements in the environment.

Funder

National Natural Science Foundation of China

Postdoctoral Science Foundation of China

Fundamental Research Funds for the Central Universities

Open Research Fund of key Laboratory of MEMS of Ministry of Education, Southeast University

Nanjing Forestry University College Student Innovation Training Program

NUPTSF

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/23/20/8351/pdf

Reference25 articles.

1. He, K., Gkioxari, G., Dollár, P., and Girshick, R. (2017, January 22–29). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.

2. Ren, S., He, K., Girshick, R., and Sun, J. (2015, January 7–12). Faster R-CNN: Towards real-time object detection with region proposal networks. Proceedings of the Advances in Neural Information Processing Systems 28 (NIPS 2015), Montreal, QC, Canada.

3. He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 11–14). Identity Mappings in Deep Residual Networks. Proceedings of the Computer Vision—ECCV 2016—14th European Conference, Amsterdam, The Netherlands.

4. Imagenet classification with deep convolutional neural networks;Krizhevsky;Commun. ACM,2017

5. Dong, Y., Su, H., Wu, B., Li, Z., Liu, W., Zhang, T., and Zhu, J. (2019, January 15–20). Efficient Decision-Based Black-Box Adversarial Attacks on Face Recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Explainable AI (XAI) Techniques for Convolutional Neural Network-Based Classification of Drilled Holes in Melamine Faced Chipboard;Applied Sciences;2024-08-23

2. A Novel Survey on Image Classification Models for Explainable Predictions using Computer Vision;2024 3rd International Conference on Applied Artificial Intelligence and Computing (ICAAIC);2024-06-05

3. Fine-grained textural detail enhancement: concatenating convolutional neural network features with adaptive fuzzy logic;Signal, Image and Video Processing;2024-04-02