PIDFusion: Fusing Dense LiDAR Points and Camera Images at Pixel-Instance Level for 3D Object Detection
-
Published:2023-10-13
Issue:20
Volume:11
Page:4277
-
ISSN:2227-7390
-
Container-title:Mathematics
-
language:en
-
Short-container-title:Mathematics
Author:
Zhang Zheng1, Xu Ruyu1, Tian Qing1
Affiliation:
1. School of Information Science and Technology, North China University of Technology, Beijing 100144, China
Abstract
In driverless systems (scenarios such as subways, buses, trucks, etc.), multi-modal data fusion, such as light detection and ranging (LiDAR) points and camera images, is essential for accurate 3D object detection. In the fusion process, the information interaction between the modes is challenging due to the different coordinate systems of various sensors and the significant difference in the density of the collected data. It is necessary to fully consider the consistency and complementarity of multi-modal information, make up for the gap between multi-source data density, and achieve the joint interactive processing of multi-source information. Therefore, this paper is based on Transformer to improve a new multi-modal fusion model called PIDFusion for 3D object detection. Firstly, the method uses the results of 2D instance segmentation to generate dense 3D virtual points to enhance the original sparse 3D point clouds. This optimizes the issue that the nearest Euclidean distance in the 2D image space cannot ensure the nearest in the 3D space. Secondly, a new cross-modal fusion architecture is designed to maintain individual per-modality features to take advantage of their unique characteristics during 3D object detection. Finally, an instance-level fusion module is proposed to enhance semantic consistency through cross-modal feature interaction. Experiments show that PIDFusion is far ahead of existing 3D object detection methods, especially for small and long-range objects, with 70.8 mAP and 73.5 NDS on the nuScenes test set.
Funder
National Key Research and Development Program of China
Subject
General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)
Reference47 articles.
1. Chen, Z., Li, Z., Zhang, S., Fang, L., Jiang, Q., Zhao, F., Zhou, B., and Zhao, H.J. (2022, January 23–29). Autoalign: Pixel-instance feature aggregation for multi-modal 3d object detection. Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22), Vienna, Austria. 2. Li, Y., Qi, X., Chen, Y., Wang, L., Li, Z., Sun, J., and Jia, J. Voxel field fusion for 3d object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022. 3. Zhou, Y., and Tuzel, O. (2018, January 18–22). Voxelnet: End-to-end learning for point cloud based 3d object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA. 4. Vora, S., Lang, A.H., Helou, B., and Beijbom, O. (2020, January 14–19). Pointpainting: Sequential fusion for 3d object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA. 5. Wang, C., Ma, C., Zhu, M., and Yang, X. (2021, January 20–25). Pointaugmenting: Cross-modal augmentation for 3d object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|