MSPV3D: Multi-Scale Point-Voxels 3D Object Detection Net
-
Published:2024-08-26
Issue:17
Volume:16
Page:3146
-
ISSN:2072-4292
-
Container-title:Remote Sensing
-
language:en
-
Short-container-title:Remote Sensing
Author:
Zhang Zheng1, Bao Zhiping1, Wei Yun2, Zhou Yongsheng3ORCID, Li Ming2, Tian Qing1
Affiliation:
1. School of Information, North China University of Technology, Beijing 100144, China 2. Corporation of Information, Beijing Mass Transit Railway Operation Co., Ltd., Beijing 100044, China 3. School of Information, Beijing University of Chemical Technology, Beijing 100029, China
Abstract
Autonomous vehicle technology is advancing, with 3D object detection based on point clouds being crucial. However, point clouds’ irregularity, sparsity, and large data volume, coupled with irrelevant background points, hinder detection accuracy. We propose a two-stage multi-scale 3D object detection network. Firstly, considering that a large number of useless background points are usually generated by the ground during detection, we propose a new ground filtering algorithm to increase the proportion of foreground points and enhance the accuracy and efficiency of the two-stage detection. Secondly, given that different types of targets to be detected vary in size, and the use of a single-scale voxelization may result in excessive loss of detailed information, the voxels of different scales are introduced to extract relevant features of objects of different scales in the point clouds and integrate them into the second-stage detection. Lastly, a multi-scale feature fusion module is proposed, which simultaneously enhances and integrates features extracted from voxels of different scales. This module fully utilizes the valuable information present in the point cloud across various scales, ultimately leading to more precise 3D object detection. The experiment is conducted on the KITTI dataset and the nuScenes dataset. Compared with our baseline, “Pedestrian” detection improved by 3.37–2.72% and “Cyclist” detection by 3.79–1.32% across difficulty levels on KITTI, and was boosted by 2.4% in NDS and 3.6% in mAP on nuScenes.
Reference41 articles.
1. Zhou, Y., and Tuzel, O. (2018, January 18–22). VoxelNet: End-to-end learning for point cloud based 3D object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA. 2. Yan, Y., Mao, Y., and Li, B. (2018). Second: Sparsely embedded convolutional detection. Sensors, 18. 3. Mao, J., Xue, Y., Niu, M., Bai, H., Feng, J., Liang, X., Xu, H., and Xu, C. (2021, January 11–17). Voxel transformer for 3D object detection. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada. 4. Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., and Beijbom, O. (2019, January 15–20). PointPillars: Fast Encoders for object detection from point clouds. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA. 5. Qi, C.R., Su, H., Mo, K., and Guibas, L.J. (2017, January 21–26). PointNet: Deep learning on point sets for 3D classification and segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.
|
|