Enhancing Remote Sensing Object Detection with K-CBST YOLO: Integrating CBAM and Swin-Transformer

Author:

Cheng Aonan12,Xiao Jincheng12,Li Yingcheng12,Sun Yiming12,Ren Yafeng12,Liu Jianli12ORCID

Affiliation:

1. National Engineering Research Center of Surveying and Mapping, China TopRS Technology Company Limited, Beijing 100039, China

2. Beijing Low-Altitude Remote Sensing Engineering Technology Research Center, Beijing 100039, China

Abstract

Object detection via remote sensing encounters significant challenges due to factors such as small target sizes, uneven target distribution, and complex backgrounds. This paper introduces the K-CBST YOLO algorithm, which is designed to address these challenges. It features a novel architecture that integrates the Convolutional Block Attention Module (CBAM) and Swin-Transformer to enhance global semantic understanding of feature maps and maximize the utilization of contextual information. Such integration significantly improves the accuracy with which small targets are detected against complex backgrounds. Additionally, we propose an improved detection network that combines the improved K-Means algorithm with a smooth Non-Maximum Suppression (NMS) algorithm. This network employs an adaptive dynamic K-Means clustering algorithm to pinpoint target areas of concentration in remote sensing images that feature varied distributions and uses a smooth NMS algorithm to suppress the confidence of overlapping candidate boxes, thereby minimizing their interference with subsequent detection results. The enhanced algorithm substantially bolsters the model’s robustness in handling multi-scale target distributions, preserves more potentially valid information, and diminishes the likelihood of missed detections. This study involved experiments performed on the publicly available DIOR remote sensing image dataset and the DOTA aerial image dataset. Our experimental results demonstrate that, compared with other advanced detection algorithms, K-CBST YOLO outperforms all its counterparts in handling both datasets. It achieved a 68.3% mean Average Precision (mAP) on the DIOR dataset and a 78.4% mAP on the DOTA dataset.

Funder

National Key Research and Development Program of China

Central Guiding Local Technology Development

Publisher

MDPI AG

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3