Abstract
Recently, deep learning has been widely used in the segmentation tasks of remote sensing images. However, the existing deep learning method most focus on local contextual information and has limited field of perception, which makes it difficult to capture the long-range contextual feature of objects at large scales form very-high-resolution (VHR) images. In this paper, we present a novel Local–global Framework consisting of the dual-source fusion network and local–global transformer modules, which efficiently utilize features extracted from multiple sources and fully capture features of local and global regions. The dual-source fusion network is an encoder designed to extract features from multiple sources such as spectra, synthetic aperture radar, and elevations, which selective fuse features from multiple sources and reduce the interference of redundant features. The local–global transformer module is proposed to capture fine-grained local features and coarse-grained global features, which enables the framework to focus on recognizing multiple-scale objects from the local and global regions. Moreover, we propose a pixelwise contrastive loss, which could encourage that the prediction is pulled closer to the ground truth. The Local–global Framework achieves state-of-the-art performance with 90.45% mean f1 score on the ISPRS Vaihingen dataset and 93.20% mean f1 score on the ISPRS Potsdam dataset.
Subject
General Earth and Planetary Sciences
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献