Regional-to-Local Point-Voxel Transformer for Large-Scale Indoor 3D Point Cloud Semantic Segmentation-Reference-Cited by-同舟云学术

Regional-to-Local Point-Voxel Transformer for Large-Scale Indoor 3D Point Cloud Semantic Segmentation

Published:2023-10-05 Issue:19 Volume:15 Page:4832
ISSN:2072-4292
Container-title:Remote Sensing
language:en
Short-container-title:Remote Sensing

Author:

Li Shuai¹^ORCID,Li Hongjun¹^ORCID

Affiliation:

1. College of Science, Beijing Forestry University, Beijing 100083, China

Abstract

Semantic segmentation of large-scale indoor 3D point cloud scenes is crucial for scene understanding but faces challenges in effectively modeling long-range dependencies and multi-scale features. In this paper, we present RegionPVT, a novel Regional-to-Local Point-Voxel Transformer that synergistically integrates voxel-based regional self-attention and window-based point-voxel self-attention for concurrent coarse-grained and fine-grained feature learning. The voxel-based regional branch focuses on capturing regional context and facilitating inter-window communication. The window-based point-voxel branch concentrates on local feature learning while integrating voxel-level information within each window. This unique design enables the model to jointly extract local details and regional structures efficiently and provides an effective and efficient solution for multi-scale feature fusion and a comprehensive understanding of 3D point clouds. Extensive experiments on S3DIS and ScanNet v2 datasets demonstrate that our RegionPVT achieves competitive or superior performance compared with state-of-the-art approaches, attaining mIoUs of 71.0% and 73.9% respectively, with significantly lower memory footprint.

Publisher

MDPI AG

Subject

General Earth and Planetary Sciences

Link

https://www.mdpi.com/2072-4292/15/19/4832/pdf

Reference72 articles.

1. Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges;Feng;IEEE Trans. Intell. Transp. Syst.,2020

2. Ando, A., Gidaris, S., Bursuc, A., Puy, G., Boulch, A., and Marlet, R. (2023, January 18–22). RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in Autonomous Driving. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.

3. 3d-mininet: Learning a 2d representation from point clouds for fast and efficient 3d lidar semantic segmentation;Alonso;IEEE Robot. Autom. Lett.,2020

4. Enhancing semantic segmentation for robotics: The power of 3-d entangled forests;Wolf;IEEE Robot. Autom. Lett.,2015

5. Ishikawa, Y., Hachiuma, R., Ienaga, N., Kuno, W., Sugiura, Y., and Saito, H. (2019, January 23–27). Semantic segmentation of 3D point cloud to virtually manipulate real living space. Proceedings of the 2019 12th Asia Pacific Workshop on Mixed and Augmented Reality (APMAR), Nara, Japan.