MCG-RTDETR: Multi-Convolution and Context-Guided Network with Cascaded Group Attention for Object Detection in Unmanned Aerial Vehicle Imagery-Reference-Cited by-同舟云学术

MCG-RTDETR: Multi-Convolution and Context-Guided Network with Cascaded Group Attention for Object Detection in Unmanned Aerial Vehicle Imagery

Published:2024-08-27 Issue:17 Volume:16 Page:3169
ISSN:2072-4292
Container-title:Remote Sensing
language:en
Short-container-title:Remote Sensing

Author:

Yu Chushi¹^ORCID,Shin Yoan¹^ORCID

Affiliation:

1. School of Electronic Engineering, Soongsil University, Seoul 06978, Republic of Korea

Abstract

In recent years, object detection in unmanned aerial vehicle (UAV) imagery has been a prominent and crucial task, with advancements in drone and remote sensing technologies. However, detecting targets in UAV images pose challenges such as complex background, severe occlusion, dense small targets, and lighting conditions. Despite the notable progress of object detection algorithms based on deep learning, they still struggle with missed detections and false alarms. In this work, we introduce an MCG-RTDETR approach based on the real-time detection transformer (RT-DETR) with dual and deformable convolution modules, a cascaded group attention module, a context-guided feature fusion structure with context-guided downsampling, and a more flexible prediction head for precise object detection in UAV imagery. Experimental outcomes on the VisDrone2019 dataset illustrate that our approach achieves the highest AP of 29.7% and AP50 of 58.2%, surpassing several cutting-edge algorithms. Visual results further validate the model’s robustness and capability in complex environments.

Funder

Ministry of Science and ICT

Institute for Information and Communications Technology Promotion

Publisher

MDPI AG

Link

https://www.mdpi.com/2072-4292/16/17/3169/pdf

Reference45 articles.

1. Deep learning for unmanned aerial vehicle-based object detection and tracking: A survey;Wu;Geosci. Remote Sens.,2022

2. Liu, Z., Rodriguez-Opazo, C., Teney, D., and Gould, S. (2021, January 11–17). Image retrieval on real-life images with pre-trained vision-and-language models. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada.

3. Reis, D., Kupec, J., Hong, J., and Daoudi, A. (2023). Real-time flying object detection with YOLOv8. arXiv.

4. Ye, B., Chang, H., Ma, B., Shan, S., and Chen, X. (2022, January 23–27). Joint feature learning and relation modeling for tracking: A one-stream framework. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.

5. Girshick, R. (2015, January 7–13). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.