Multilevel Geometric Feature Embedding in Transformer Network for ALS Point Cloud Semantic Segmentation-Reference-Cited by-同舟云学术

Multilevel Geometric Feature Embedding in Transformer Network for ALS Point Cloud Semantic Segmentation

Published:2024-09-12 Issue:18 Volume:16 Page:3386
ISSN:2072-4292
Container-title:Remote Sensing
language:en
Short-container-title:Remote Sensing

Author:

Liang Zhuanxin¹^ORCID,Lai Xudong¹^ORCID

Affiliation:

1. School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430070, China

Abstract

Effective semantic segmentation of Airborne Laser Scanning (ALS) point clouds is a crucial field of study and influences subsequent point cloud application tasks. Transformer networks have made significant progress in 2D/3D computer vision tasks, exhibiting superior performance. We propose a multilevel geometric feature embedding transformer network (MGFE-T), which aims to fully utilize the three-dimensional structural information carried by point clouds and enhance transformer performance in ALS point cloud semantic segmentation. In the encoding stage, compute the geometric features surrounding tee sampling points at each layer and embed them into the transformer workflow. To ensure that the receptive field of the self-attention mechanism and the geometric computation domain can maintain a consistent scale at each layer, we propose a fixed-radius dilated KNN (FR-DKNN) search method to address the limitation of traditional KNN search methods in considering domain radius. In the decoding stage, we aggregate prediction deviations at each level into a unified loss value, enabling multilevel supervision to improve the network’s feature learning ability at different levels. The MGFE-T network can predict the class label of each point in an end-to-end manner. Experiments were conducted on three widely used benchmark datasets. The results indicate that the MGFE-T network achieves superior OA and mF1 scores on the LASDU and DFC2019 datasets and performs well on the ISPRS dataset with imbalanced classes.

Funder

National Natural Science Foundation of China

China State Railway Group Co., Ltd.

Fundamental Research Funds for the Central Universities

Hubei Provincial Geographical National Condition Monitoring Center

Publisher

MDPI AG

Link

https://www.mdpi.com/2072-4292/16/18/3386/pdf

Reference48 articles.

1. Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E. (2015, January 7–13). Multi-View Convolutional Neural Networks for 3D Shape Recognition. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.

2. Qi, C.R., Su, H., Niessner, M., Dai, A., Yan, M., and Guibas, L.J. (July, January 26). Volumetric and Multi-View CNNs for Object Classification on 3D Data. Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition, (CVPR), Las Vegas, NV, USA.

3. Maturana, D., and Scherer, S. (October, January 28). VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition. Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany.

4. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., and Xiao, J. (2015, January 7–12). 3D ShapeNets: A Deep Representation for Volumetric Shapes. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.

5. Charles, R.Q., Su, H., Kaichun, M., and Guibas, L.J. (2017, January 21–26). PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.