MANet: End-to-End Learning for Point Cloud Based on Robust Pointpillar and Multiattention-Reference-Cited by-同舟云学术

MANet: End-to-End Learning for Point Cloud Based on Robust Pointpillar and Multiattention

Published:2022-09-14 Issue: Volume:2022 Page:1-12
ISSN:1530-8677
Container-title:Wireless Communications and Mobile Computing
language:en
Short-container-title:Wireless Communications and Mobile Computing

Author:

Gan Xingli¹^ORCID,Shi Hao¹^ORCID,Yang Shan²,Xiao Yao³,Sun Lu⁴

Affiliation:

1. School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, China

2. China Unicom Smart City Research Institute, Beijing 100048, China

3. College of Sericulture, Textile and Biomass Sciences, Southwest University, China

4. Department of Communication Engineering, Institute of Information Science Technology, Dalian Maritime University, China

Abstract

Detecting 3D objects in a crowd remains a challenging problem since the cars and pedestrians often gather together and occlude each other in the real world. The Pointpillar is the leader in 3D object detection, its detection process is simple, and the detection speed is fast. Due to the use of maxpooling in the Voxel Feature Encode (VFE) stage to extract global features, the fine-grained features will disappear, resulting in insufficient feature expression ability in the feature pyramid network (FPN) stage, so the object detection of small targets is not accurate enough. This paper proposes to improve the detection effect of networks in complex environments by integrating attention mechanisms and the Pointpillar. In the VFE stage of the model, the mixed-attention module (HA) was added to retain the spatial structure information of the point cloud to the greatest extent from the three perspectives: local space, global space, and points. The Convolutional Block Attention Module (CBAM) was embedded in FPN to mine the deep information of pseudoimages. The experiments based on the KITTI dataset demonstrated our method had better performance than other state-of-the-art single-stage algorithms. Compared with another model, in crowd scenes, the mean average precision (mAP) under the bird’s-eye view (BEV) detection benchmark increased from 59.20% of Pointpillar and 66.19% of TANet to 69.91 of ours, the mAP under the 3D detection benchmark was increased from 62% of TANet to 65.11% of ours, and the detection speed only dropped from 13.1 fps of Pointpillar to 12.8 fps of ours.

Funder

Radar Signal Processing National Defense Science and Technology Key Laboratory Fund

Publisher

Hindawi Limited

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Information Systems

Link

http://downloads.hindawi.com/journals/wcmc/2022/6909314.pdf

Reference39 articles.

1. Are we ready for autonomous driving? The KITTI vision benchmark suite

2. Pointnet: deep learning on point sets for 3D classification and segmentation;C. R. Qi

3. VoxelNet: end-to-end learning for point cloud based 3D object detection;Y. Zhou

4. 3D Convolutional Neural Networks for Human Action Recognition

5. Single-shot refinement neural network for object detection;S. Zhang

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. YCANet: Target Detection for Complex Traffic Scenes Based on Camera-LiDAR Fusion;IEEE Sensors Journal;2024-03-15

2. Artificial intelligence in tunnel construction: A comprehensive review of hotspots and frontier topics;Geohazard Mechanics;2023-12

3. SWFNet: Efficient Sampling and Weighted Fusion for Enhanced PointPillars in 3D Object Detection;2023 6th International Conference on Information Communication and Signal Processing (ICICSP);2023-09-23