MANet: End-to-End Learning for Point Cloud Based on Robust Pointpillar and Multiattention

Author:

Gan Xingli1ORCID,Shi Hao1ORCID,Yang Shan2,Xiao Yao3,Sun Lu4

Affiliation:

1. School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, China

2. China Unicom Smart City Research Institute, Beijing 100048, China

3. College of Sericulture, Textile and Biomass Sciences, Southwest University, China

4. Department of Communication Engineering, Institute of Information Science Technology, Dalian Maritime University, China

Abstract

Detecting 3D objects in a crowd remains a challenging problem since the cars and pedestrians often gather together and occlude each other in the real world. The Pointpillar is the leader in 3D object detection, its detection process is simple, and the detection speed is fast. Due to the use of maxpooling in the Voxel Feature Encode (VFE) stage to extract global features, the fine-grained features will disappear, resulting in insufficient feature expression ability in the feature pyramid network (FPN) stage, so the object detection of small targets is not accurate enough. This paper proposes to improve the detection effect of networks in complex environments by integrating attention mechanisms and the Pointpillar. In the VFE stage of the model, the mixed-attention module (HA) was added to retain the spatial structure information of the point cloud to the greatest extent from the three perspectives: local space, global space, and points. The Convolutional Block Attention Module (CBAM) was embedded in FPN to mine the deep information of pseudoimages. The experiments based on the KITTI dataset demonstrated our method had better performance than other state-of-the-art single-stage algorithms. Compared with another model, in crowd scenes, the mean average precision (mAP) under the bird’s-eye view (BEV) detection benchmark increased from 59.20% of Pointpillar and 66.19% of TANet to 69.91 of ours, the mAP under the 3D detection benchmark was increased from 62% of TANet to 65.11% of ours, and the detection speed only dropped from 13.1 fps of Pointpillar to 12.8 fps of ours.

Funder

Radar Signal Processing National Defense Science and Technology Key Laboratory Fund

Publisher

Hindawi Limited

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Information Systems

Reference39 articles.

1. Are we ready for autonomous driving? The KITTI vision benchmark suite

2. Pointnet: deep learning on point sets for 3D classification and segmentation;C. R. Qi

3. VoxelNet: end-to-end learning for point cloud based 3D object detection;Y. Zhou

4. 3D Convolutional Neural Networks for Human Action Recognition

5. Single-shot refinement neural network for object detection;S. Zhang

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3