YOLO-DCTI: Small Object Detection in Remote Sensing Base on Contextual Transformer Enhancement

Author:

Min Lingtong1,Fan Ziman1ORCID,Lv Qinyi1,Reda Mohamed2ORCID,Shen Linghao3ORCID,Wang Binglu3

Affiliation:

1. School of Electronic Information, Northwestern Polytechnical University, Xi’an 710072, China

2. Department of Avionics, Military Technical College, Cairo 4393010, Egypt

3. School of Automation, Northwestern Polytechnical University, Xi’an 710072, China

Abstract

Object detection for remote sensing is a fundamental task in image processing of remote sensing; as one of the core components, small or tiny object detection plays an important role. Despite the considerable advancements achieved in small object detection with the integration of CNN and transformer networks, there remains untapped potential for enhancing the extraction and utilization of information associated with small objects. Particularly within transformer structures, this potential arises from the disregard of the complex and the intertwined interplay between spatial context information and channel information during the global modeling of pixel-level information within small objects. As a result, valuable information is prone to being obfuscated and annihilated. To mitigate this limitation, we propose an innovative framework, YOLO-DCTI, that capitalizes on the Contextual Transformer (CoT) framework for the detection of small or tiny objects. Specifically, within CoT, we seamlessly incorporate global residuals and local fusion mechanisms throughout the entire input-to-output pipeline. This integration facilitates a profound investigation into the network’s intrinsic representations at deeper levels and fosters the fusion of spatial contextual attributes with channel characteristics. Moreover, we propose an improved decoupled contextual transformer detection head structure, denoted as DCTI, to effectively resolve the feature conflicts that ensue from the concurrent classification and regression tasks. The experimental results on the Dota, VISDrone, and NWPU VHR-10 datasets show that, on the powerful real-time detection network YOLOv7, the speed and accuracy of tiny targets are better balanced.

Funder

National Natural Science Foundation of China

Fundamental Research Funds for the Central Universities, the Postdoctoral Science Foundation of China

the Fourth Special Grant of China Postdoctoral Science Foundation

Publisher

MDPI AG

Subject

General Earth and Planetary Sciences

Reference56 articles.

1. Multiple instance graph learning for weakly supervised remote sensing object detection;Wang;IEEE Trans. Geosci. Remote Sens.,2021

2. Deep learning-based detection from the perspective of tiny objects: A survey;Tong;Image Vis. Comput.,2022

3. CDD-Net: A context-driven detection network for multiclass object detection;Wu;IEEE Geosci. Remote Sens. Lett.,2020

4. YOLOv5-Tassel: Detecting tassels in RGB UAV imagery with improved YOLOv5 is based on transfer learning;Liu;IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.,2022

5. Li, J., Liang, X., Wei, Y., Xu, T., Feng, J., and Yan, S. (2017, January 21–26). Perceptual generative adversarial networks for small object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.

Cited by 14 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3