YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery-Reference-Cited by-同舟云学术

YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery

Published:2023-02-24 Issue:5 Volume:23 Page:2522
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Wu Yiheng¹,Li Jianjun¹

Affiliation:

1. College of Computer and Information Engineering, Central South University of Forestry and Technology University, Changsha 410004, China

Abstract

The deep learning method for natural-image object detection tasks has made tremendous progress in recent decades. However, due to multiscale targets, complex backgrounds, and high-scale small targets, methods from the field of natural images frequently fail to produce satisfactory results when applied to aerial images. To address these problems, we proposed the DET-YOLO enhancement based on YOLOv4. Initially, we employed a vision transformer to acquire highly effective global information extraction capabilities. In the transformer, we proposed deformable embedding instead of linear embedding and a full convolution feedforward network (FCFN) instead of a feedforward network in order to reduce the feature loss caused by cutting in the embedding process and improve the spatial feature extraction capability. Second, for improved multiscale feature fusion in the neck, we employed a depth direction separable deformable pyramid module (DSDP) rather than a feature pyramid network. Experiments on the DOTA, RSOD, and UCAS-AOD datasets demonstrated that our method’s average accuracy (mAP) values reached 0.728, 0.952, and 0.945, respectively, which were comparable to the existing state-of-the-art methods.

Funder

National Natural Science Foundation

General Program of the Natural Science Foundation of Hunan Province

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/23/5/2522/pdf

Reference49 articles.

1. Lee, J., Moon, S., Nam, D.W., Lee, J., Oh, A.R., and Yoo, W. (2020, January 21–23). A Study on the Identification of Warship Type/Class by Measuring Similarity with Virtual Warship. Proceedings of the 2020 International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Republic of Korea.

2. Daniilidis, K., Maragos, P., and Paragios, N. (2010). Computer Vision—ECCV 2010, Proceedings of the 11th European Conference on Computer Vision, Heraklion, Crete, Greece, 5–11 September 2010, Springer.