Affiliation:
1. Jiangsu Key Laboratory of Big Data Analysis Technology (B-DAT), Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science and Technology, Nanjing 210044, China
2. School of Electronic Information Engineering, Suzhou Vocational University, Suzhou 215104, China
Abstract
Multispectral image (MSI) and hyperspectral image (HSI) fusion (MHIF) aims to address the challenge of acquiring high-resolution (HR) HSI images. This field combines a low-resolution (LR) HSI with an HR-MSI to reconstruct HR-HSIs. Existing methods directly utilize transformers to perform feature extraction and fusion. Despite the demonstrated success, there exist two limitations: (1) Employing the entire transformer model for feature extraction and fusion fails to fully harness the potential of the transformer in integrating the spectral information of the HSI and spatial information of the MSI. (2) HSIs have a strong spectral correlation and exhibit sparsity in the spatial domain. Existing transformer-based models do not optimize this physical property, which makes their methods prone to spectral distortion. To accomplish these issues, this paper introduces a novel framework for MHIF called a Sparse Mix-Attention Transformer (SMAformer). Specifically, to fully harness the advantages of the transformer architecture, we propose a Spectral Mix-Attention Block (SMAB), which concatenates the keys and values extracted from LR-HSIs and HR-MSIs to create a new multihead attention module. This design facilitates the extraction of detailed long-range information across spatial and spectral dimensions. Additionally, to address the spatial sparsity inherent in HSIs, we incorporated a sparse mechanism within the core of the SMAB called the Sparse Spectral Mix-Attention Block (SSMAB). In the SSMAB, we compute attention maps from queries and keys and select the K highly correlated values as the sparse-attention map. This approach enables us to achieve a sparse representation of spatial information while eliminating spatially disruptive noise. Extensive experiments conducted on three synthetic benchmark datasets, namely CAVE, Harvard, and Pavia Center, demonstrate that the SMAformer method outperforms state-of-the-art methods.
Funder
Seventh Batch of Science and Technology Development Plan (Agriculture) Project of Suzhou
NSFC
Subject
General Earth and Planetary Sciences
Reference48 articles.
1. Spectral–spatial feature tokenization transformer for hyperspectral image classification;Sun;IEEE Trans. Geosci. Remote Sens.,2022
2. Uzkent, B., Hoffman, M.J., and Vodacek, A. (July, January 26). Real-time vehicle tracking in aerial video using hyperspectral features. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA.
3. Multimodal GANs: Toward crossmodal hyperspectral–multispectral image segmentation;Hong;IEEE Trans. Geosci. Remote Sens.,2020
4. Improving component substitution pansharpening through multivariate regression of MS + Pan data;Aiazzi;IEEE Trans. Geosci. Remote Sens.,2007
5. Comparison of three different methods to merge multiresolution and multispectral data- Landsat TM and SPOT panchromatic;Chavez;Photogramm. Eng. Remote Sens.,1991