PIDFusion: Fusing Dense LiDAR Points and Camera Images at Pixel-Instance Level for 3D Object Detection-Reference-Cited by-同舟云学术

PIDFusion: Fusing Dense LiDAR Points and Camera Images at Pixel-Instance Level for 3D Object Detection

Published:2023-10-13 Issue:20 Volume:11 Page:4277
ISSN:2227-7390
Container-title:Mathematics
language:en
Short-container-title:Mathematics

Author:

Zhang Zheng¹,Xu Ruyu¹,Tian Qing¹

Affiliation:

1. School of Information Science and Technology, North China University of Technology, Beijing 100144, China

Abstract

In driverless systems (scenarios such as subways, buses, trucks, etc.), multi-modal data fusion, such as light detection and ranging (LiDAR) points and camera images, is essential for accurate 3D object detection. In the fusion process, the information interaction between the modes is challenging due to the different coordinate systems of various sensors and the significant difference in the density of the collected data. It is necessary to fully consider the consistency and complementarity of multi-modal information, make up for the gap between multi-source data density, and achieve the joint interactive processing of multi-source information. Therefore, this paper is based on Transformer to improve a new multi-modal fusion model called PIDFusion for 3D object detection. Firstly, the method uses the results of 2D instance segmentation to generate dense 3D virtual points to enhance the original sparse 3D point clouds. This optimizes the issue that the nearest Euclidean distance in the 2D image space cannot ensure the nearest in the 3D space. Secondly, a new cross-modal fusion architecture is designed to maintain individual per-modality features to take advantage of their unique characteristics during 3D object detection. Finally, an instance-level fusion module is proposed to enhance semantic consistency through cross-modal feature interaction. Experiments show that PIDFusion is far ahead of existing 3D object detection methods, especially for small and long-range objects, with 70.8 mAP and 73.5 NDS on the nuScenes test set.

Funder

National Key Research and Development Program of China

Publisher

MDPI AG

Subject

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2227-7390/11/20/4277/pdf

Reference47 articles.

1. Chen, Z., Li, Z., Zhang, S., Fang, L., Jiang, Q., Zhao, F., Zhou, B., and Zhao, H.J. (2022, January 23–29). Autoalign: Pixel-instance feature aggregation for multi-modal 3d object detection. Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22), Vienna, Austria.

2. Li, Y., Qi, X., Chen, Y., Wang, L., Li, Z., Sun, J., and Jia, J. Voxel field fusion for 3d object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022.

3. Zhou, Y., and Tuzel, O. (2018, January 18–22). Voxelnet: End-to-end learning for point cloud based 3d object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.

4. Vora, S., Lang, A.H., Helou, B., and Beijbom, O. (2020, January 14–19). Pointpainting: Sequential fusion for 3d object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.

5. Wang, C., Ma, C., Zhu, M., and Yang, X. (2021, January 20–25). Pointaugmenting: Cross-modal augmentation for 3d object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MSPV3D: Multi-Scale Point-Voxels 3D Object Detection Net;Remote Sensing;2024-08-26

2. Improved Model Architecture for Multi-Object Detection via Multi -Sensor LiDAR and Camera Data Fusion;2024 International Conference on Machine Intelligence and Smart Innovation (ICMISI);2024-05-12