Affiliation:
1. State Key Laboratory of Resources and Environmental Information Systems, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
2. College of Resources and Environment, University of the Chinese Academy of Sciences, Beijing 100049, China
Abstract
Challenges in enhancing the multiclass segmentation of remotely sensed data include expensive and scarce labeled samples, complex geo-surface scenes, and resulting biases. The intricate nature of geographical surfaces, comprising varying elements and features, introduces significant complexity to the task of segmentation. The limited label data used to train segmentation models may exhibit biases due to imbalances or the inadequate representation of certain surface types or features. For applications like land use/cover monitoring, the assumption of evenly distributed simple random sampling may be not satisfied due to spatial stratified heterogeneity, introducing biases that can adversely impact the model’s ability to generalize effectively across diverse geographical areas. We introduced two statistical indicators to encode the complexity of geo-features under multiclass scenes and designed a corresponding optimal sampling scheme to select representative samples to reduce sampling bias during machine learning model training, especially that of deep learning models. The results of the complexity scores showed that the entropy-based and gray-based indicators effectively detected the complexity from geo-surface scenes: the entropy-based indicator was sensitive to the boundaries of different classes and the contours of geographical objects, while the Moran’s I indicator had a better performance in identifying the spatial structure information of geographical objects in remote sensing images. According to the complexity scores, the optimal sampling methods appropriately adapted the distribution of the training samples to the geo-context and enhanced their representativeness relative to the population. The single-score optimal sampling method achieved the highest improvement in DeepLab-V3 (increasing pixel accuracy by 0.3% and MIoU by 5.5%), and the multi-score optimal sampling method achieved the highest improvement in SegFormer (increasing ACC by 0.2% and MIoU by 2.4%). These findings carry significant implications for quantifying the complexity of geo-surface scenes and hence can enhance the semantic segmentation of high-resolution remote sensing images with less sampling bias.
Funder
the National Key Research and Development Program of China
Reference84 articles.
1. Ferrari, V., Hebert, M., Sminchisescu, C., and Weiss, Y. (2018, January 8–14). Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. Proceedings of the Computer Vision–ECCV 2018, Munich, Germany.
2. Fully Convolutional Networks for Semantic Segmentation;Shelhamer;IEEE Trans. Pattern Anal. Mach. Intell.,2017
3. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation;Badrinarayanan;IEEE Trans. Pattern Anal. Mach. Intell.,2017
4. A Survey of Multimodal Hybrid Deep Learning for Computer Vision: Architectures, Applications, Trends, and Challenges;Bayoudh;Infin. Fusion,2024
5. Deep Learning and Geriatric Mental Health;Aizenstein;Am. J. Geriatr. Psychiatry,2023