An efficient multi‐scale transformer for satellite image dehazing-Reference-Cited by-同舟云学术

An efficient multi‐scale transformer for satellite image dehazing

Published:2024-03-19 Issue:8 Volume:41 Page:
ISSN:0266-4720
Container-title:Expert Systems
language:en
Short-container-title:Expert Systems

Author:

Yang Lei¹²,Cao Jianzhong¹²,Chen Weining¹,Wang Hao¹,He Lang³⁴⁵^ORCID

Affiliation:

1. Xi'an Institute of Optics and Precision Mechanics Chinese Academy of Sciences Xi'an China

2. School of Computer Science and Technology University of Chinese Academy of Sciences Beijing China

3. School of Computer Science and Technology Xi'an University of Posts and Telecommunications Xi'an China

4. Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing Xi'an University of Posts and Telecommunications Xi'an China

5. Xi'an Key Laboratory of Big Data and Intelligent Computing Xi'an University of Posts and Telecommunications Xi'an China

Abstract

AbstractGiven the impressive achievement of convolutional neural networks (CNNs) in grasping image priors from extensive datasets, they have been widely utilized for tasks related to image restoration. Recently, there is been significant progress in another category of neural architectures—Transformers. These models have demonstrated remarkable performance in natural language tasks and higher‐level vision applications. Despite their ability to address some of CNNs limitations, such as restricted receptive fields and adaptability issues, Transformer models often face difficulties when processing images with a high level of detail. This is because the complexity of the computations required increases significantly with the image's spatial resolution. As a result, their application to most high‐resolution image restoration tasks becomes impractical. In our research, we introduce a novel Transformer model, named DehFormer, by implementing specific design modifications in its fundamental components, for example, the multi‐head attention and feed‐forward network. Specifically, the proposed architecture consists of the three modules, that is, (a) multi‐scale feature aggregation network (MSFAN), (b) the gated‐Dconv feed‐forward network (GFFN), (c) and the multi‐Dconv head transposed attention (MDHTA). For the MDHTA module, our objective is to scrutinize the mechanics of scaled dot‐product attention through the utilization of per‐element product operations, thereby bypassing the need for matrix multiplications and operating directly in the frequency domain for enhanced efficiency. For the GFFN module, which enables only the relevant and valuable information to advance through the network hierarchy, thereby enhancing the efficiency of information flow within the model. Extensive experiments are conducted on the SateHazelk, RS‐Haze, and RSID datasets, resulting in performance that significantly exceeds that of existing methods.

Funder

National Natural Science Foundation of China

Education Department of Shaanxi Province

Natural Science Basic Research Program of Shaanxi Province

Publisher

Wiley

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1111/exsy.13575

Reference46 articles.

1. Ba J. L. Kiros J. R. &Hinton G. E.(2016).Layer normalization. arXiv Preprint arXiv:1607.06450.

2. Earthquake Damage Assessment of Buildings Using VHR Optical and SAR Imagery

3. Simple Baselines for Image Restoration

4. Trinity-Net: Gradient-Guided Swin Transformer-Based Remote Sensing Image Dehazing and Beyond

5. Cho S.‐J. Ji S.‐W. Hong J.‐P. Jung S.‐W. &Ko S.‐J.(2021).Rethinking coarse‐to‐fine approach in single image deblurring in: Proceedings of the IEEE/CVF international conference on computer vision. pp. 4641–4650.