Skip-Encoder and Skip-Decoder for Detection Transformer in Optical Remote Sensing
-
Published:2024-08-07
Issue:16
Volume:16
Page:2884
-
ISSN:2072-4292
-
Container-title:Remote Sensing
-
language:en
-
Short-container-title:Remote Sensing
Author:
Yang Feifan1ORCID, Chen Gang1ORCID, Duan Jianshu1ORCID
Affiliation:
1. School of Geography and Ocean Science, Nanjing University, Nanjing 210023, China
Abstract
The transformer architecture is gradually gaining attention in remote sensing. Many algorithms related to this architecture have been proposed. However, the DEtection TRansformer (DETR) has been proposed as a new approach for implementing object detection tasks. It uses the transformer architecture for feature extraction, and its improved derivative models are uncommon in remote sensing object detection (RSOD). Hence, we selected the DETR with the improved deNoising anchor boxes (DINO) model as a foundation, upon which we have made improvements under the characteristics of remote sensing images (RSIs). Specifically, we proposed the skip-encoder (SE) module that can be applied to the encoder stage of the model and the skip-decoder (SD) module for the decoder stage. The SE module can enhance the model’s ability to extract multiscale features. The SD module can reduce computational complexity and maintain the model performance. The experimental results on the NWPU VHR-10 and DIOR datasets demonstrate that the SE and SD modules can improve DINO for better learning small- and medium-sized targets in RSIs. We achieved a mean average precision of 94.8% on the NWPU VHR-10 dataset and 75.6% on the DIOR dataset.
Funder
National Natural Science Foundation of China
Reference37 articles.
1. Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23–28). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA. 2. Girshick, R. (2015, January 7–13). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile. 3. Faster r-cnn: Towards real-time object detection with region proposal networks;Ren;IEEE Trans. Pattern Anal. Mach. Intell.,2017 4. Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (July, January 26). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA. 5. Lin, T.Y., Goyal, P., Girshick, R., He, K., and Dollár, P. (2017, January 22–29). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.
|
|