Skip-Encoder and Skip-Decoder for Detection Transformer in Optical Remote Sensing-Reference-Cited by-同舟云学术

Skip-Encoder and Skip-Decoder for Detection Transformer in Optical Remote Sensing

Published:2024-08-07 Issue:16 Volume:16 Page:2884
ISSN:2072-4292
Container-title:Remote Sensing
language:en
Short-container-title:Remote Sensing

Author:

Yang Feifan¹^ORCID,Chen Gang¹^ORCID,Duan Jianshu¹^ORCID

Affiliation:

1. School of Geography and Ocean Science, Nanjing University, Nanjing 210023, China

Abstract

The transformer architecture is gradually gaining attention in remote sensing. Many algorithms related to this architecture have been proposed. However, the DEtection TRansformer (DETR) has been proposed as a new approach for implementing object detection tasks. It uses the transformer architecture for feature extraction, and its improved derivative models are uncommon in remote sensing object detection (RSOD). Hence, we selected the DETR with the improved deNoising anchor boxes (DINO) model as a foundation, upon which we have made improvements under the characteristics of remote sensing images (RSIs). Specifically, we proposed the skip-encoder (SE) module that can be applied to the encoder stage of the model and the skip-decoder (SD) module for the decoder stage. The SE module can enhance the model’s ability to extract multiscale features. The SD module can reduce computational complexity and maintain the model performance. The experimental results on the NWPU VHR-10 and DIOR datasets demonstrate that the SE and SD modules can improve DINO for better learning small- and medium-sized targets in RSIs. We achieved a mean average precision of 94.8% on the NWPU VHR-10 dataset and 75.6% on the DIOR dataset.

Funder

National Natural Science Foundation of China

Publisher

MDPI AG

Link

https://www.mdpi.com/2072-4292/16/16/2884/pdf

Reference37 articles.

1. Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23–28). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.

2. Girshick, R. (2015, January 7–13). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.

3. Faster r-cnn: Towards real-time object detection with region proposal networks;Ren;IEEE Trans. Pattern Anal. Mach. Intell.,2017

4. Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (July, January 26). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.

5. Lin, T.Y., Goyal, P., Girshick, R., He, K., and Dollár, P. (2017, January 22–29). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.