Author:
Zhang X.,Lin D.,Xue R.,Soergel U.
Abstract
Abstract. In large-scale urban areas, the diversity of objects and the complexity of scenes pose challenges to semantic segmentation of point clouds. In particular, the data imbalance problem often results in poor performance for rare classes in large scenes. This paper proposes a rare class segmentation method based on the target-guided transformer network. In the network, all the feature extraction and segmentation procedures are realized by attention mechanisms. The self-attention blocks are embedded in U-Net-like structure to gradually integrate the features from local to global. Then, under the supervision of our target-guided block, the instance features of data-imbalanced rare classes are mapped onto the multi-scale features. At last, a multi-layer perceptron is utilized to convert the fused features to the segmentation logits for generating the semantic labels. Experiments using the Hessigheim High- Resolution 3D Point Cloud Benchmark indicated that our approach considerably outperforms the baseline network by up to 11.66% in terms of mean F1 score. In particular, the rare classes Vehicle and Chimney obtain outstanding F1-scores of 82.40% and 82.51%, respectively. Furthermore, our method achieves an overall accuracy of 87.63%, which increases by 1.09% compared to the baseline model.