DFS-DETR: Detailed-Feature-Sensitive Detector for Small Object Detection in Aerial Images Using Transformer
-
Published:2024-08-27
Issue:17
Volume:13
Page:3404
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Cao Xinyu1, Wang Hanwei2, Wang Xiong1, Hu Bin3
Affiliation:
1. School of Information Science and Engineering, Yunnan University, Chenggong District, Kunming 650500, China 2. Civil Engineering, The University of Liverpool, Liverpool L3 8HA, UK 3. Department of Computer Science and Technology, Kean University, Union, NJ 07083, USA
Abstract
Object detection in aerial images plays a crucial role across diverse domains such as agriculture, environmental monitoring, and security. Aerial images present several challenges, including dense small objects, intricate backgrounds, and occlusions, necessitating robust detection algorithms. This paper addresses the critical need for accurate and efficient object detection in aerial images using a Transformer-based approach enhanced with specialized methodologies, termed DFS-DETR. The core framework leverages RT-DETR-R18, integrating the Cross Stage Partial Reparam Dilation-wise Residual Module (CSP-RDRM) to optimize feature extraction. Additionally, the introduction of the Detail-Sensitive Pyramid Network (DSPN) enhances sensitivity to local features, complemented by the Dynamic Scale Sequence Feature-Fusion Module (DSSFFM) for comprehensive multi-scale information integration. Moreover, Multi-Attention Add (MAA) is utilized to refine feature processing, which enhances the model’s capacity for understanding and representation by integrating various attention mechanisms. To improve bounding box regression, the model employs MPDIoU with normalized Wasserstein distance, which accelerates convergence. Evaluation across the VisDrone2019, AI-TOD, and NWPU VHR-10 datasets demonstrates significant improvements in the mean average precision (mAP) values: 24.1%, 24.0%, and 65.0%, respectively, surpassing RT-DETR-R18 by 2.3%, 4.8%, and 7.0%, respectively. Furthermore, the proposed method achieves real-time inference speeds. This approach can be deployed on drones to perform real-time ground detection.
Reference43 articles.
1. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 11–14). Ssd: Single shot multibox detector. Proceedings of the Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands. 2. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21–26). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA. 3. Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18–23). Path aggregation network for instance segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA. 4. ARFP: A novel adaptive recursive feature pyramid for object detection in aerial images;Wang;Appl. Intell.,2022 5. Laplacian feature pyramid network for object detection in VHR optical remote sensing images;Zhang;IEEE Trans. Geosci. Remote Sens.,2021
|
|