Swin-Transformer-Based YOLOv5 for Small-Object Detection in Remote Sensing Images
Author:
Cao Xuan1, Zhang Yanwei2, Lang Song2, Gong Yan23
Affiliation:
1. School of Physical Science and Technology, Suzhou University of Science and Technology, Suzhou 215009, China 2. Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou 215613, China 3. Jinan Guoke Medical Technology Development Co., Ltd., Jinan 250104, China
Abstract
This study aimed to address the problems of low detection accuracy and inaccurate positioning of small-object detection in remote sensing images. An improved architecture based on the Swin Transformer and YOLOv5 is proposed. First, Complete-IOU (CIOU) was introduced to improve the K-means clustering algorithm, and then an anchor of appropriate size for the dataset was generated. Second, a modified CSPDarknet53 structure combined with Swin Transformer was proposed to retain sufficient global context information and extract more differentiated features through multi-head self-attention. Regarding the path-aggregation neck, a simple and efficient weighted bidirectional feature pyramid network was proposed for effective cross-scale feature fusion. In addition, extra prediction head and new feature fusion layers were added for small objects. Finally, Coordinate Attention (CA) was introduced to the YOLOv5 network to improve the accuracy of small-object features in remote sensing images. Moreover, the effectiveness of the proposed method was demonstrated by several kinds of experiments on the DOTA (Dataset for Object detection in Aerial images). The mean average precision on the DOTA dataset reached 74.7%. Compared with YOLOv5, the proposed method improved the mean average precision (mAP) by 8.9%, which can achieve a higher accuracy of small-object detection in remote sensing images.
Funder
National Natural Science Foundation of China Jinan Innovation Team Scientisc Research and Equipment Development Project of Chinese Academy of Sciences Jiangsu Key Disciplines of the Fourteenth Five-Year Plan
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference39 articles.
1. Zhu, X., Lyu, S., Wang, X., and Zhao, Q. (2021, January 11–17). In TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada. 2. Ding, Y. (2020). Research and Implementation of Small Target Detection Network in Complex Background. [Master’s Thesis, Beijing University of Posts and Telecommunications]. 3. An improved faster-RCNN model for handwritten character recognition;Albahli;Arab. J. Sci. Eng.,2021 4. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A.C. (2016). Ssd: Single Shot Multibox Detector, European Conference on Computer Vision, Springer. 5. Gong, H., Mu, T., Li, Q., Dai, H., Li, C., He, Z., Wang, W., Han, F., Tuniyazi, A., and Li, H. (2022). Swin-Transformer-Enabled YOLOv5 with Attention Mechanism for Small Object Detection on Satellite Images. Remote Sens., 14.
Cited by
15 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|