Multi-Scale Feature Fusion Point Cloud Object Detection Based on Original Point Cloud and Projection-Reference-Cited by-同舟云学术

Multi-Scale Feature Fusion Point Cloud Object Detection Based on Original Point Cloud and Projection

Published:2024-06-06 Issue:11 Volume:13 Page:2213
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Zhang Zhikang¹²^ORCID,Zhu Zhongjie¹²^ORCID,Bai Yongqiang²^ORCID,Jin Yiwen²,Wang Ming¹²^ORCID

Affiliation:

1. Faculty of Information Science and Engineering, Ocean University of China, Qingdao 266100, China

2. Ningbo Key Lab of DSP, Zhejiang Wanli University, Ningbo 315100, China

Abstract

Existing point cloud object detection algorithms struggle to effectively capture spatial features across different scales, often resulting in inadequate responses to changes in object size and limited feature extraction capabilities, thereby affecting detection accuracy. To solve this problem, we present a point cloud object detection method based on multi-scale feature fusion of the original point cloud and projection, which aims to improve the multi-scale performance and completeness of feature extraction in point cloud object detection. First, we designed a 3D feature extraction module based on the 3D Swin Transformer. This module pre-processes the point cloud using a 3D Patch Partition approach and employs a self-attention mechanism within a 3D sliding window, along with a downsampling strategy, to effectively extract features at different scales. At the same time, we convert the 3D point cloud to a 2D image using projection technology and extract 2D features using the Swin Transformer. A 2D/3D feature fusion module is then built to integrate 2D and 3D features at the channel level through point-by-point addition and vector concatenation to improve feature completeness. Finally, the integrated feature maps are fed into the detection head to facilitate efficient object detection. Experimental results show that our method has improved the average precision of vehicle detection by 1.01% on the KITTI dataset over three levels of difficulty compared to Voxel-RCNN. In addition, visualization analyses show that our proposed algorithm also exhibits superior performance in object detection.

Funder

National Natural Science Foundation of China

Zhejiang Provincial Natural Science Foundation of China

Yongjiang Sci-Tech Innovation 2035

Ningbo Municipal Major Project of Science and Technology Innovation 2025

Publisher

MDPI AG

Link

https://www.mdpi.com/2079-9292/13/11/2213/pdf

Reference23 articles.

1. Shi, S., Wang, X., and Li, H. (2019, January 15–20). Pointrcnn: 3d object proposal generation and detection from point cloud. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.

2. MR-DCAE: Manifold regularization-based deep convolutional autoencoder for unauthorized broadcasting identification;Zheng;Int. J. Intell. Syst.,2021

3. Li, B. (2017, January 24–28). 3d fully convolutional network for vehicle detection in point cloud. Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada.

4. Beltrán, J., Guindel, C., Moreno, F.M., Cruzado, D., Garcia, F., and De La Escalera, A. (2018, January 4–7). Birdnet: A 3d object detection framework from lidar information. Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA.

5. Yang, B., Luo, W., and Urtasun, R. (2018, January 18–23). Pixor: Real-time 3d object detection from point clouds. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.