DCEF2-YOLO: Aerial Detection YOLO with Deformable Convolution–Efficient Feature Fusion for Small Target Detection-Reference-Cited by-同舟云学术

DCEF2-YOLO: Aerial Detection YOLO with Deformable Convolution–Efficient Feature Fusion for Small Target Detection

Published:2024-03-18 Issue:6 Volume:16 Page:1071
ISSN:2072-4292
Container-title:Remote Sensing
language:en
Short-container-title:Remote Sensing

Author:

Shin Yeonha¹^ORCID,Shin Heesub²^ORCID,Ok Jaewoo²^ORCID,Back Minyoung²^ORCID,Youn Jaehyuk²,Kim Sungho¹^ORCID

Affiliation:

1. Advanced Visual Intelligence Laboratory, Department of Electronic Engineering, Yeungnam University, 280 Daehak-ro, Gyeongsan 38541, Republic of Korea

2. LIG Nex1 Co., Ltd., Yongin 16911, Republic of Korea

Abstract

Deep learning technology for real-time small object detection in aerial images can be used in various industrial environments such as real-time traffic surveillance and military reconnaissance. However, detecting small objects with few pixels and low resolution remains a challenging problem that requires performance improvement. To improve the performance of small object detection, we propose DCEF 2-YOLO. Our proposed method enables efficient real-time small object detection by using a deformable convolution (DFConv) module and an efficient feature fusion structure to maximize the use of the internal feature information of objects. DFConv preserves small object information by preventing the mixing of object information with the background. The optimized feature fusion structure produces high-quality feature maps for efficient real-time small object detection while maximizing the use of limited information. Additionally, modifying the input data processing stage and reducing the detection layer to suit small object detection also contributes to performance improvement. When compared to the performance of the latest YOLO-based models (such as DCN-YOLO and YOLOv7), DCEF 2-YOLO outperforms them, with a mAP of +6.1% on the DOTA-v1.0 test set, +0.3% on the NWPU VHR-10 test set, and +1.5% on the VEDAI512 test set. Furthermore, it has a fast processing speed of 120.48 FPS with an RTX3090 for 512 × 512 images, making it suitable for real-time small object detection tasks.

Funder

Korea Research Institute for defense Technology planning and advancement

Defense Acquisition Program Administration

Publisher

MDPI AG

Link

https://www.mdpi.com/2072-4292/16/6/1071/pdf

Reference59 articles.

1. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Li, F.F. (2009, January 20–25). Imagenet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.

2. Lin, T., Maire, M., Belongie, S.J., Bourdev, L.D., Girshick, R.B., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C.L. (2014). Microsoft COCO: Common Objects in Context. arXiv.

3. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., and Zisserman, A. (2024, March 17). The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. Available online: http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.

4. UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation;Zhou;IEEE Trans. Med. Imaging,2019

5. Xiang, T., Zhang, C., Liu, D., Song, Y., Huang, H., and Cai, W. (2020, January 4–8). BiO-Net: Learning recurrent bi-directional connections for encoder-decoder architecture. Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. One-Year-Old Precocious Chinese Mitten Crab Identification Algorithm Based on Task Alignment;Animals;2024-07-21