Enhanced Window-Based Self-Attention with Global and Multi-Scale Representations for Remote Sensing Image Super-Resolution-Reference-Cited by-同舟云学术

Enhanced Window-Based Self-Attention with Global and Multi-Scale Representations for Remote Sensing Image Super-Resolution

Published:2024-08-02 Issue:15 Volume:16 Page:2837
ISSN:2072-4292
Container-title:Remote Sensing
language:en
Short-container-title:Remote Sensing

Author:

Lu Yuting¹^ORCID,Wang Shunzhou²,Wang Binglu³,Zhang Xin³^ORCID,Wang Xiaoxu¹,Zhao Yongqiang¹^ORCID

Affiliation:

1. School of Automation, Northwestern Polytechnical University, Xi’an 710072, China

2. Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen 518055, China

3. Radar Research Laboratory, School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China

Abstract

Transformers have recently gained significant attention in low-level vision tasks, particularly for remote sensing image super-resolution (RSISR). The vanilla vision transformer aims to establish long-range dependencies between image patches. However, its global receptive field leads to a quadratic increase in computational complexity with respect to spatial size, rendering it inefficient for addressing RSISR tasks that involve processing large-sized images. In an effort to mitigate computational costs, recent studies have explored the utilization of local attention mechanisms, inspired by convolutional neural networks (CNNs), focusing on interactions between patches within small windows. Nevertheless, these approaches are naturally influenced by smaller participating receptive fields, and the utilization of fixed window sizes hinders their ability to perceive multi-scale information, consequently limiting model performance. To address these challenges, we propose a hierarchical transformer model named the Multi-Scale and Global Representation Enhancement-based Transformer (MSGFormer). We propose an efficient attention mechanism, Dual Window-based Self-Attention (DWSA), combining distributed and concentrated attention to balance computational complexity and the receptive field range. Additionally, we incorporated the Multi-scale Depth-wise Convolution Attention (MDCA) module, which is effective in capturing multi-scale features through multi-branch convolution. Furthermore, we developed a new Tracing-Back Structure (TBS), offering tracing-back mechanisms for both proposed attention modules to enhance their feature representation capability. Extensive experiments demonstrate that MSGFormer outperforms state-of-the-art methods on multiple public RSISR datasets by up to 0.11–0.55 dB.

Funder

Postdoctoral Science Foundation of China

Shaanxi Science Fund for Distinguished Young Scholars

Basic and Applied Basic Research Foundation of Guangdong Province

Publisher

MDPI AG

Link

https://www.mdpi.com/2072-4292/16/15/2837/pdf

Reference73 articles.

1. SpectralGPT: Spectral remote sensing foundation model;Hong;IEEE Trans. Pattern Anal. Mach. Intell.,2024

2. Towards Integrity and Detail with Ensemble Learning for Salient Object Detection in Optical Remote Sensing Images;Liu;IEEE Trans. Geosci. Remote Sens.,2024

3. Deep learning-based harmonization and super-resolution of Landsat-8 and Sentinel-2 images;Sambandham;ISPRS J. Photogramm. Remote Sens.,2024

4. Wang, Y., Yuan, W., Xie, F., and Lin, B. (2024). ESatSR: Enhancing Super-Resolution for Satellite Remote Sensing Images with State Space Model and Spatial Context. Remote Sens., 16.

5. A novel fusion framework embedded with zero-shot super-resolution and multivariate autoregression for precipitable water vapor across the continental Europe;Wu;Remote Sens. Environ.,2023