Hybrid Cross-Feature Interaction Attention Module for Object Detection in Intelligent Mobile Scenes-Reference-Cited by-同舟云学术

Hybrid Cross-Feature Interaction Attention Module for Object Detection in Intelligent Mobile Scenes

Published:2023-10-17 Issue:20 Volume:15 Page:4991
ISSN:2072-4292
Container-title:Remote Sensing
language:en
Short-container-title:Remote Sensing

Author:

Tian Di¹²,Han Yi²,Liu Yongtao²^ORCID,Li Jiabo¹,Zhang Ping²,Liu Ming³

Affiliation:

1. Mechanical Engineering College, Xi’an Shiyou University, Xi’an 710065, China

2. School of Automobile, Chang’an University, Xi’an 710064, China

3. Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China

Abstract

Object detection is one of the fundamental tasks in computer vision, holding immense significance in the realm of intelligent mobile scenes. This paper proposes a hybrid cross-feature interaction (HCFI) attention module for object detection in intelligent mobile scenes. Firstly, the paper introduces multiple kernel (MK) spatial pyramid pooling (SPP) based on SPP and improves the channel attention using its structure. This results in a hybrid cross-channel interaction (HCCI) attention module with better cross-channel interaction performance. Additionally, we bolster spatial attention by incorporating dilated convolutions, leading to the creation of the cross-spatial interaction (CSI) attention module with superior cross-spatial interaction performance. By seamlessly combining the above two modules, we achieve an improved HCFI attention module without resorting to computationally expensive operations. Through a series of experiments involving various detectors and datasets, our proposed method consistently demonstrates superior performance. This results in a performance improvement of 1.53% for YOLOX on COCO and a performance boost of 2.05% for YOLOv5 on BDD100K. Furthermore, we propose a solution that combines HCCI and HCFI to address the challenge of extremely small output feature layers in detectors, such as SSD. The experimental results indicate that the proposed method significantly improves the attention capability of object detection in intelligent mobile scenes.

Funder

National Key Research and Development Program of China

Natural Science Foundation of Shaanxi Province

Key Research and Development Program of Shaanxi Province

Publisher

MDPI AG

Subject

General Earth and Planetary Sciences

Link

https://www.mdpi.com/2072-4292/15/20/4991/pdf

Reference48 articles.

1. Simonyan, K., and Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv.

2. He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27–30). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.

3. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7–12). Going deeper with convolutions. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.

4. Object Detection in 20 Years: A Survey;Zou;Proc. IEEE,2019

5. Deep Learning for Generic Object Detection: A Survey;Liu;Int. J. Comput. Vis.,2018