Hybrid Cross-Feature Interaction Attention Module for Object Detection in Intelligent Mobile Scenes
-
Published:2023-10-17
Issue:20
Volume:15
Page:4991
-
ISSN:2072-4292
-
Container-title:Remote Sensing
-
language:en
-
Short-container-title:Remote Sensing
Author:
Tian Di12, Han Yi2, Liu Yongtao2ORCID, Li Jiabo1, Zhang Ping2, Liu Ming3
Affiliation:
1. Mechanical Engineering College, Xi’an Shiyou University, Xi’an 710065, China 2. School of Automobile, Chang’an University, Xi’an 710064, China 3. Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
Abstract
Object detection is one of the fundamental tasks in computer vision, holding immense significance in the realm of intelligent mobile scenes. This paper proposes a hybrid cross-feature interaction (HCFI) attention module for object detection in intelligent mobile scenes. Firstly, the paper introduces multiple kernel (MK) spatial pyramid pooling (SPP) based on SPP and improves the channel attention using its structure. This results in a hybrid cross-channel interaction (HCCI) attention module with better cross-channel interaction performance. Additionally, we bolster spatial attention by incorporating dilated convolutions, leading to the creation of the cross-spatial interaction (CSI) attention module with superior cross-spatial interaction performance. By seamlessly combining the above two modules, we achieve an improved HCFI attention module without resorting to computationally expensive operations. Through a series of experiments involving various detectors and datasets, our proposed method consistently demonstrates superior performance. This results in a performance improvement of 1.53% for YOLOX on COCO and a performance boost of 2.05% for YOLOv5 on BDD100K. Furthermore, we propose a solution that combines HCCI and HCFI to address the challenge of extremely small output feature layers in detectors, such as SSD. The experimental results indicate that the proposed method significantly improves the attention capability of object detection in intelligent mobile scenes.
Funder
National Key Research and Development Program of China Natural Science Foundation of Shaanxi Province Key Research and Development Program of Shaanxi Province
Subject
General Earth and Planetary Sciences
Reference48 articles.
1. Simonyan, K., and Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv. 2. He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27–30). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA. 3. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7–12). Going deeper with convolutions. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA. 4. Object Detection in 20 Years: A Survey;Zou;Proc. IEEE,2019 5. Deep Learning for Generic Object Detection: A Survey;Liu;Int. J. Comput. Vis.,2018
|
|