Enhancing Remote Sensing Object Detection with K-CBST YOLO: Integrating CBAM and Swin-Transformer-Reference-Cited by-同舟云学术

Enhancing Remote Sensing Object Detection with K-CBST YOLO: Integrating CBAM and Swin-Transformer

Published:2024-08-07 Issue:16 Volume:16 Page:2885
ISSN:2072-4292
Container-title:Remote Sensing
language:en
Short-container-title:Remote Sensing

Author:

Cheng Aonan¹²,Xiao Jincheng¹²,Li Yingcheng¹²,Sun Yiming¹²,Ren Yafeng¹²,Liu Jianli¹²^ORCID

Affiliation:

1. National Engineering Research Center of Surveying and Mapping, China TopRS Technology Company Limited, Beijing 100039, China

2. Beijing Low-Altitude Remote Sensing Engineering Technology Research Center, Beijing 100039, China

Abstract

Object detection via remote sensing encounters significant challenges due to factors such as small target sizes, uneven target distribution, and complex backgrounds. This paper introduces the K-CBST YOLO algorithm, which is designed to address these challenges. It features a novel architecture that integrates the Convolutional Block Attention Module (CBAM) and Swin-Transformer to enhance global semantic understanding of feature maps and maximize the utilization of contextual information. Such integration significantly improves the accuracy with which small targets are detected against complex backgrounds. Additionally, we propose an improved detection network that combines the improved K-Means algorithm with a smooth Non-Maximum Suppression (NMS) algorithm. This network employs an adaptive dynamic K-Means clustering algorithm to pinpoint target areas of concentration in remote sensing images that feature varied distributions and uses a smooth NMS algorithm to suppress the confidence of overlapping candidate boxes, thereby minimizing their interference with subsequent detection results. The enhanced algorithm substantially bolsters the model’s robustness in handling multi-scale target distributions, preserves more potentially valid information, and diminishes the likelihood of missed detections. This study involved experiments performed on the publicly available DIOR remote sensing image dataset and the DOTA aerial image dataset. Our experimental results demonstrate that, compared with other advanced detection algorithms, K-CBST YOLO outperforms all its counterparts in handling both datasets. It achieved a 68.3% mean Average Precision (mAP) on the DIOR dataset and a 78.4% mAP on the DOTA dataset.

Funder

National Key Research and Development Program of China

Central Guiding Local Technology Development

Publisher

MDPI AG

Link

https://www.mdpi.com/2072-4292/16/16/2885/pdf

Reference23 articles.

1. Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23–28). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.

2. Girshick, R. (2015, January 7–13). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.

3. Ssd: Single shot multibox detector;Liu;Proceedings of the Computer Vision–ECCV 2016: 14th European Conference,2016

4. Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27–30). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.

5. Lin, T.Y., Goyal, P., Girshick, R., He, K., and Dollár, P. (2017, January 22–29). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.