STransU2Net: Transformer based hybrid model for building segmentation in detailed satellite imagery-Reference-Cited by-同舟云学术

STransU2Net: Transformer based hybrid model for building segmentation in detailed satellite imagery

Published:2024-09-12 Issue:9 Volume:19 Page:e0299732
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Liu Guangjie,Diao Kuo^ORCID,Zhu Jinlong,Wang Qi,Li Meng

Abstract

As essential components of human society, buildings serve a multitude of functions and significance. Convolutional Neural Network (CNN) has made remarkable progress in the task of building extraction from detailed satellite imagery, owing to the potent capability to capture local information. However, CNN performs suboptimal in extracting larger buildings. Conversely, Transformer has excelled in capturing global information through self-attention mechanisms but are less effective in capturing local information compared to CNN, resulting in suboptimal performance in extracting smaller buildings. Therefore, we have designed the hybrid model STransU2Net, which combines meticulously designed Transformer and CNN to extract buildings of various sizes. In detail, we designed a Bottleneck Pooling Block (BPB) to replace the conventional Max Pooling layer during the downsampling phase, aiming to enhance the extraction of edge information. Furthermore, we devised the Channel And Spatial Attention Block (CSAB) to enhance the target location information during the encoding and decoding stages. Additionally, we added a Swin Transformer Block (STB) at the skip connection location to enhance the model’s global modeling ability. Finally, we empirically assessed the performance of STransU2Net on both the Aerial imagery and Satellite II datasets, The IoU achieved state-of-the-art results with 91.04% and 59.09%, respectively, outperforming other models.

Funder

Jilin Provincial Department of Education

Jilin Province Education Science Planning Project

Opening Foundation of State Key Laboratory of Cognitive Intelligence

Publisher

Public Library of Science (PLoS)

Reference50 articles.

1. JointNet: A common neural network for road and building extraction;Z Zhang;Remote Sensing,2019

2. Cheng D, Liao R, Fidler S, Urtasun R. Darnet: Deep active ray network for building segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 7431–7439.

3. Automatic building segmentation of aerial imagery using multi-constraint fully convolutional networks;G Wu;Remote Sensing,2018

4. Chen K, Fu K, Gao X, Yan M, Sun X, Zhang H. Building extraction from remote sensing images with deep learning in a supervised manner. In: 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). IEEE; 2017. p. 1672–1675.

5. Dilated-ResUnet: A novel deep learning architecture for building extraction from medium resolution multi-spectral satellite imagery;M Dixit;Expert Systems with Applications,2021