Cloudformer: A Cloud-Removal Network Combining Self-Attention Mechanism and Convolution
-
Published:2022-12-03
Issue:23
Volume:14
Page:6132
-
ISSN:2072-4292
-
Container-title:Remote Sensing
-
language:en
-
Short-container-title:Remote Sensing
Author:
Wu Peiyang, Pan ZongxuORCID, Tang HairongORCID, Hu Yuxin
Abstract
Optical remote-sensing images have a wide range of applications, but they are often obscured by clouds, which affects subsequent analysis. Therefore, cloud removal becomes a necessary preprocessing step. In this paper, a novel and superior transformer-based network is proposed, named Cloudformer. The proposed method novelly combines the advantages of convolution and a self-attention mechanism: it uses convolution layers to extract simple features over a small range in the shallow layer, and exerts the advantage of a self-attention mechanism in extracting correlation in a large range in the deep layer. This method also introduces Locally-enhanced Positional Encoding (LePE) to flexibly generate suitable positional encodings for different inputs and to utilize local information to enhance encoding capabilities. Exhaustive experiments on public datasets demonstrate the superior ability of the method to remove both thin and thick clouds, and the effectiveness of the proposed modules is validated by ablation studies.
Funder
the Youth Innovation Promotion Association, CAS
Subject
General Earth and Planetary Sciences
Reference64 articles.
1. Maggiori, E., Tarabalka, Y., Charpiat, G., and Alliez, P. (2017, January 23–28). Can semantic labeling methods generalize to any city? the inria aerial image labeling benchmark. Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Worth, TX, USA. 2. Monitoring forest changes in the southwestern United States using multitemporal Landsat data;Vogelmann;Remote Sens. Environ.,2009 3. Spatial and temporal distribution of clouds observed by MODIS onboard the Terra and Aqua satellites;King;IEEE Trans. Geosci. Remote Sens.,2013 4. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16×16 words: Transformers for image recognition at scale. arXiv. 5. Chen, H., Wang, Y., Guo, T., Xu, C., Deng, Y., Liu, Z., Ma, S., Xu, C., Xu, C., and Gao, W. (2021, January 19–25). Pre-trained image processing transformer. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|