Affiliation:
1. School of Software Technology Zhejiang University Hangzhou China
2. Innovation Center of Yangtze River Delta Zhejiang University Jiaxing China
Abstract
AbstractRemote sensing images (RSIs) often possess obvious background noises, exhibit a multi‐scale phenomenon, and are characterized by complex scenes with ground objects in diversely spatial distribution pattern, bringing challenges to the corresponding semantic segmentation. CNN‐based methods can hardly address the diverse spatial distributions of ground objects, especially their compositional relationships, while Vision Transformers (ViTs) introduce background noises and have a quadratic time complexity due to dense global matrix multiplications. In this paper, we introduce Adaptive Pattern Matching (APM), a lightweight method for long‐range adaptive weight aggregation. Our APM obtains a set of pixels belonging to the same spatial distribution pattern of each pixel, and calculates the adaptive weights according to their compositional relationships. In addition, we design a tiny U‐shaped network using the APM as a module to address the large variance of scales of ground objects in RSIs. This network is embedded after each stage in a backbone network to establish a Multi‐stage U‐shaped Adaptive Pattern Matching Network (MAPMaN), for nested multi‐scale modeling of ground objects towards semantic segmentation of RSIs. Experiments on three datasets demonstrate that our MAPMaN can outperform the state‐of‐the‐art methods in common metrics. The code can be available at https://github.com/INiid/MAPMaN.
Funder
National Natural Science Foundation of China
Subject
Computer Graphics and Computer-Aided Design
Reference47 articles.
1. ChildR. GrayS. RadfordA. SutskeverI.: Generating long sequences with sparse transformers.arXiv preprint arXiv:1904.10509(2019). 2
2. ChaiB. NieX. GaoH. JiaJ. QiaoQ.: Remote sensing images background noise processing method for ship objects in instance segmentation.Journal of the Indian Society of Remote Sensing(2023) 1–13. 1
3. ChenL.-C. ZhuY. PapandreouG. SchroffF. AdamH.: Encoder-decoder with atrous separable convolution for semantic image segmentation. InProceedings of the European conference on computer vision (ECCV)(2018) pp.801–818. 2 7 8
4. DosovitskiyA. BeyerL. KolesnikovA. WeissenbornD. ZhaiX. UnterthinerT. DehghaniM. MindererM. HeigoldG. GellyS. et al.: An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929(2020). 1 2 9
5. Land use land cover classification of remote sensing images based on the deep learning approaches: a statistical analysis and review