Drone-Based Visible–Thermal Object Detection with Transformers and Prompt Tuning-Reference-Cited by-同舟云学术

Drone-Based Visible–Thermal Object Detection with Transformers and Prompt Tuning

Published:2024-09-01 Issue:9 Volume:8 Page:451
ISSN:2504-446X
Container-title:Drones
language:en
Short-container-title:Drones

Author:

Chen Rui¹^ORCID,Li Dongdong¹,Gao Zhinan¹,Kuai Yangliu²,Wang Chengyuan³

Affiliation:

1. College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China

2. College of Intelligent Science and Technology, National University of Defense Technology, Changsha 410073, China

3. Information and Communication College, National University of Defense Technology, Wuhan 430010, China

Abstract

The use of unmanned aerial vehicles (UAVs) for visible–thermal object detection has emerged as a powerful technique to improve accuracy and resilience in challenging contexts, including dim lighting and severe weather conditions. However, most existing research relies on Convolutional Neural Network (CNN) frameworks, limiting the application of the Transformer’s attention mechanism to mere fusion modules and neglecting its potential for comprehensive global feature modeling. In response to this limitation, this study introduces an innovative dual-modal object detection framework called Visual Prompt multi-modal Detection (VIP-Det) that harnesses the Transformer architecture as the primary feature extractor and integrates vision prompts for refined feature fusion. Our approach begins with the training of a single-modal baseline model to solidify robust model representations, which is then refined through fine-tuning that incorporates additional modal data and prompts. Tests on the DroneVehicle dataset show that our algorithm achieves remarkable accuracy, outperforming comparable Transformer-based methods. These findings indicate that our proposed methodology marks a significant advancement in the realm of UAV-based object detection, holding significant promise for enhancing autonomous surveillance and monitoring capabilities in varied and challenging environments.

Funder

National Natural Science Foundation of China

scientific research project of National University of Defense Technology

Publisher

MDPI AG

Link

https://www.mdpi.com/2504-446X/8/9/451/pdf

Reference36 articles.

1. He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27–30). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.

2. Girshick, R. (2015, January 7–13). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.

3. Faster r-cnn: Towards real-time object detection with region proposal networks;Ren;IEEE Trans. Pattern Anal. Mach. Intell.,2017

4. Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27–30). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.

5. Du, D., Zhu, P., Wen, L., Bian, X., and Liu, Z.M. (2019, January 27–28). VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results. Proceedings of the ICCV Visdrone Workshop, Seoul, Republic of Korea.