Affiliation:
1. Faculty of Information Science and Engineering, Ocean University of China, Qingdao 266100, China
2. Ningbo Key Lab of DSP, Zhejiang Wanli University, Ningbo 315100, China
Abstract
Existing point cloud object detection algorithms struggle to effectively capture spatial features across different scales, often resulting in inadequate responses to changes in object size and limited feature extraction capabilities, thereby affecting detection accuracy. To solve this problem, we present a point cloud object detection method based on multi-scale feature fusion of the original point cloud and projection, which aims to improve the multi-scale performance and completeness of feature extraction in point cloud object detection. First, we designed a 3D feature extraction module based on the 3D Swin Transformer. This module pre-processes the point cloud using a 3D Patch Partition approach and employs a self-attention mechanism within a 3D sliding window, along with a downsampling strategy, to effectively extract features at different scales. At the same time, we convert the 3D point cloud to a 2D image using projection technology and extract 2D features using the Swin Transformer. A 2D/3D feature fusion module is then built to integrate 2D and 3D features at the channel level through point-by-point addition and vector concatenation to improve feature completeness. Finally, the integrated feature maps are fed into the detection head to facilitate efficient object detection. Experimental results show that our method has improved the average precision of vehicle detection by 1.01% on the KITTI dataset over three levels of difficulty compared to Voxel-RCNN. In addition, visualization analyses show that our proposed algorithm also exhibits superior performance in object detection.
Funder
National Natural Science Foundation of China
Zhejiang Provincial Natural Science Foundation of China
Yongjiang Sci-Tech Innovation 2035
Ningbo Municipal Major Project of Science and Technology Innovation 2025