SRBPSwin: Single-Image Super-Resolution for Remote Sensing Images Using a Global Residual Multi-Attention Hybrid Back-Projection Network Based on the Swin Transformer
-
Published:2024-06-20
Issue:12
Volume:16
Page:2252
-
ISSN:2072-4292
-
Container-title:Remote Sensing
-
language:en
-
Short-container-title:Remote Sensing
Author:
Qin Yi12ORCID, Wang Jiarong1ORCID, Cao Shenyi3, Zhu Ming1, Sun Jiaqi12, Hao Zhicheng1, Jiang Xin1ORCID
Affiliation:
1. Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China 2. University of Chinese Academy of Sciences, Beijing 100049, China 3. Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
Abstract
Remote sensing images usually contain abundant targets and complex information distributions. Consequently, networks are required to model both global and local information in the super-resolution (SR) reconstruction of remote sensing images. The existing SR reconstruction algorithms generally focus on only local or global features, neglecting effective feedback for reconstruction errors. Therefore, a Global Residual Multi-attention Fusion Back-projection Network (SRBPSwin) is introduced by combining the back-projection mechanism with the Swin Transformer. We incorporate a concatenated Channel and Spatial Attention Block (CSAB) into the Swin Transformer Block (STB) to design a Multi-attention Hybrid Swin Transformer Block (MAHSTB). SRBPSwin develops dense back-projection units to provide bidirectional feedback for reconstruction errors, enhancing the network’s feature extraction capabilities and improving reconstruction performance. SRBPSwin consists of the following four main stages: shallow feature extraction, shallow feature refinement, dense back projection, and image reconstruction. Firstly, for the input low-resolution (LR) image, shallow features are extracted and refined through the shallow feature extraction and shallow feature refinement stages. Secondly, multiple up-projection and down-projection units are designed to alternately process features between high-resolution (HR) and LR spaces, obtaining more accurate and detailed feature representations. Finally, global residual connections are utilized to transfer shallow features during the image reconstruction stage. We propose a perceptual loss function based on the Swin Transformer to enhance the detail of the reconstructed image. Extensive experiments demonstrate the significant reconstruction advantages of SRBPSwin in quantitative evaluation and visual quality.
Funder
Science and Technology Department of Jilin Province of China Science and Technology project of Jilin Provincial Education Department of China
Reference56 articles.
1. Wang, Z., Yi, J., Guo, J., Song, Y., Lyu, J., Xu, J., Yan, W., Zhao, J., Cai, Q., and Min, H. (2022). A Review of Image Super-Resolution Approaches Based on Deep Learning and Applications in Remote Sensing. Remote Sens., 14. 2. Liu, C., Zhang, S., Hu, M., and Song, Q. (2024). Object Detection in Remote Sensing Images Based on Adaptive Multi-Scale Feature Fusion Method. Remote Sens., 16. 3. Remote Sensing Scene Classification Based on Multibranch Fusion Network;Shi;IEEE Geosci. Remote Sens. Lett.,2023 4. Chen, X., Li, D., Liu, M., and Jia, J. (2023). CNN and Transformer Fusion for Remote Sensing Image Semantic Segmentation. Remote Sens., 15. 5. Huang, L., An, R., Zhao, S., and Jiang, T. (2020). A Deep Learning-Based Robust Change Detection Approach for Very High Resolution Remotely Sensed Images with Multiple Features. Remote Sens., 12.
|
|