CNN and Transformer Fusion for Remote Sensing Image Semantic Segmentation-Reference-Cited by-同舟云学术

CNN and Transformer Fusion for Remote Sensing Image Semantic Segmentation

Published:2023-09-10 Issue:18 Volume:15 Page:4455
ISSN:2072-4292
Container-title:Remote Sensing
language:en
Short-container-title:Remote Sensing

Author:

Chen Xin¹^ORCID,Li Dongfen¹^ORCID,Liu Mingzhe¹^ORCID,Jia Jiaru¹

Affiliation:

1. State Key Laboratory of Geohazard Prevention and Geoenvironment Protection, Chengdu University of Technology, Chengdu 610059, China

Abstract

Semantic segmentation of remote sensing images has been widely used in environmental protection, geological disaster discovery, and natural resource assessment. With the rapid development of deep learning, convolutional neural networks (CNNs) have dominated semantic segmentation, relying on their powerful local information extraction capabilities. Due to the locality of convolution operation, it can be challenging to obtain global context information directly. However, Transformer has excellent potential in global information modeling. This paper proposes a new hybrid convolutional and Transformer semantic segmentation model called CTFuse, which uses a multi-scale convolutional attention module in the convolutional part. CTFuse is a serial structure composed of a CNN and a Transformer. It first uses convolution to extract small-size target information and then uses Transformer to embed large-size ground target information. Subsequently, we propose a spatial and channel attention module in convolution to enhance the representation ability for global information and local features. In addition, we also propose a spatial and channel attention module in Transformer to improve the ability to capture detailed information. Finally, compared to other models used in the experiments, our CTFuse achieves state-of-the-art results on the International Society of Photogrammetry and Remote Sensing (ISPRS) Vaihingen and ISPRS Potsdam datasets.

Publisher

MDPI AG

Subject

General Earth and Planetary Sciences

Link

https://www.mdpi.com/2072-4292/15/18/4455/pdf

Reference59 articles.

1. Improved maize cultivated area estimation over a large scale combining modis–evi time series data and crop phenological information;Zhang;ISPRS J. Photogramm. Remote Sens.,2014

2. Scale sequence joint deep learning (ss-jdl) for land use and land cover classification;Zhang;Remote Sens. Environ.,2020

3. Using aerial imagery and gis in automated building footprint extraction and shape recognition for earthquake risk assessment of urban inventories;Sahar;IEEE Trans. Geosci. Remote Sens.,2010

4. Joint deep learning for land cover and land use classification;Zhang;Remote Sens. Environ.,2019

5. Fu, Y., Zhao, C., Wang, J., Jia, X., Yang, G., Song, X., and Feng, H. (2017). An improved combination of spectral and spatial features for vegetation classification in hyperspectral images. Remote Sens., 9.

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Few-shot intent detection with self-supervised pretraining and prototype-aware attention;Pattern Recognition;2024-11

2. Multi-Degradation Super-Resolution Reconstruction for Remote Sensing Images with Reconstruction Features-Guided Kernel Correction;Remote Sensing;2024-08-09

3. DCP-Net: A Distributed Collaborative Perception Network for Remote Sensing Semantic Segmentation;Remote Sensing;2024-07-08

4. CDTracker: Coarse-to-Fine Feature Matching and Point Densification for 3D Single-Object Tracking;Remote Sensing;2024-06-25

5. SRBPSwin: Single-Image Super-Resolution for Remote Sensing Images Using a Global Residual Multi-Attention Hybrid Back-Projection Network Based on the Swin Transformer;Remote Sensing;2024-06-20