Affiliation:
1. School of Electronic Engineering, Soongsil University, Seoul 06978, Republic of Korea
Abstract
In recent years, object detection in unmanned aerial vehicle (UAV) imagery has been a prominent and crucial task, with advancements in drone and remote sensing technologies. However, detecting targets in UAV images pose challenges such as complex background, severe occlusion, dense small targets, and lighting conditions. Despite the notable progress of object detection algorithms based on deep learning, they still struggle with missed detections and false alarms. In this work, we introduce an MCG-RTDETR approach based on the real-time detection transformer (RT-DETR) with dual and deformable convolution modules, a cascaded group attention module, a context-guided feature fusion structure with context-guided downsampling, and a more flexible prediction head for precise object detection in UAV imagery. Experimental outcomes on the VisDrone2019 dataset illustrate that our approach achieves the highest AP of 29.7% and AP50 of 58.2%, surpassing several cutting-edge algorithms. Visual results further validate the model’s robustness and capability in complex environments.
Funder
Ministry of Science and ICT
Institute for Information and Communications Technology Promotion
Reference45 articles.
1. Deep learning for unmanned aerial vehicle-based object detection and tracking: A survey;Wu;Geosci. Remote Sens.,2022
2. Liu, Z., Rodriguez-Opazo, C., Teney, D., and Gould, S. (2021, January 11–17). Image retrieval on real-life images with pre-trained vision-and-language models. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada.
3. Reis, D., Kupec, J., Hong, J., and Daoudi, A. (2023). Real-time flying object detection with YOLOv8. arXiv.
4. Ye, B., Chang, H., Ma, B., Shan, S., and Chen, X. (2022, January 23–27). Joint feature learning and relation modeling for tracking: A one-stream framework. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.
5. Girshick, R. (2015, January 7–13). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.