Center-Aware 3D Object Detection with Attention Mechanism Based on Roadside LiDAR
-
Published:2023-02-01
Issue:3
Volume:15
Page:2628
-
ISSN:2071-1050
-
Container-title:Sustainability
-
language:en
-
Short-container-title:Sustainability
Author:
Shi Haobo123, Hou Dezao123, Li Xiyao123
Affiliation:
1. Research Institute of Highway, Ministry of Transport, Beijing 100088, China 2. Key Laboratory of Intelligent Transportation Technology and Transportation Industry, Beijing 100088, China 3. National Intelligent Transport Systems Center of Engineering and Technology, Beijing 100088, China
Abstract
Infrastructure 3D Object Detection is a pivotal component of Vehicle-Infrastructure Cooperated Autonomous Driving (VICAD). As turning objects account for a high proportion of traffic at intersections, anchor-free representation in the bird’s-eye view (BEV) is more suitable for roadside 3D detection. In this work, we propose CetrRoad, a simple yet effective center-aware detector with transformer-based detection head for roadside 3D object detection with single LiDAR (Light Detection and Ranging). CetrRoad firstly utilizes a voxel-based roadside LiDAR feature encoder module that voxelizes and projects the raw point cloud into BEV with dense feature representation, following a one-stage center proposal module that initializes center candidates of objects based on the top N points in the BEV target heatmap with unnormalized 2D Gaussian. Then, taking attending center proposals as query embedding, a detection head with multi-head self-attention and multi-scale multi-head deformable cross attention can refine and predict 3D bounding boxes for different classes moving/parked at the intersection. Extensive experiments and analyses demonstrate that our method achieves state-of-the-art performance on the DAIR-V2X-I benchmark with an acceptable training time cost, especially for Car and Cyclist. CetrRoad also reaches comparable results with the multi-modal fusion method for Pedestrian. An ablation study demonstrates that center-aware query as input can provide denser supervision than a purified feature map in the attention-based detection head. Moreover, we were able to intuitively observe that in complex traffic environment, our proposed model could produce more accurate 3D detection results than other compared methods with fewer false positives, which is helpful for other downstream VICAD tasks.
Funder
Joint Funds of the National Natural Science Foundation of China
Subject
Management, Monitoring, Policy and Law,Renewable Energy, Sustainability and the Environment,Geography, Planning and Development,Building and Construction
Reference55 articles.
1. Creß, C., and Knoll, A.C. (2021). Intelligent Transportation Systems With The Use of External Infrastructure: A Literature Survey. arXiv. 2. Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., and Beijbom, O. (2020, January 13–19). nuscenes: A multimodal dataset for autonomous driving. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA. 3. Yin, T., Zhou, X., and Krahenbuhl, P. (2021, January 20–25). Center-based 3d object detection and tracking. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA. 4. 3D Detection and Pose Estimation of Vehicle in Cooperative Vehicle Infrastructure System;Guo;IEEE Sens. J.,2021 5. Zou, Z., Zhang, R., Shen, S., Pandey, G., Chakravarty, P., Parchami, A., and Liu, H.X. (2022, January 23–27). Real-time full-stack traffic scene perception for autonomous driving with roadside cameras. Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|