AFTR: A Robustness Multi-Sensor Fusion Model for 3D Object Detection Based on Adaptive Fusion Transformer
Author:
Zhang Yan1, Liu Kang1ORCID, Bao Hong2, Qian Xu1, Wang Zihan1, Ye Shiqing1, Wang Weicen1
Affiliation:
1. School of Artificial Intelligence, China University of Mining and Technology-Beijing, Beijing 100083, China 2. College of Robotics, Beijing Union University, Beijing 100027, China
Abstract
Multi-modal sensors are the key to ensuring the robust and accurate operation of autonomous driving systems, where LiDAR and cameras are important on-board sensors. However, current fusion methods face challenges due to inconsistent multi-sensor data representations and the misalignment of dynamic scenes. Specifically, current fusion methods either explicitly correlate multi-sensor data features by calibrating parameters, ignoring the feature blurring problems caused by misalignment, or find correlated features between multi-sensor data through global attention, causing rapidly escalating computational costs. On this basis, we propose a transformer-based end-to-end multi-sensor fusion framework named the adaptive fusion transformer (AFTR). The proposed AFTR consists of the adaptive spatial cross-attention (ASCA) mechanism and the spatial temporal self-attention (STSA) mechanism. Specifically, ASCA adaptively associates and interacts with multi-sensor data features in 3D space through learnable local attention, alleviating the problem of the misalignment of geometric information and reducing computational costs, and STSA interacts with cross-temporal information using learnable offsets in deformable attention, mitigating displacements due to dynamic scenes. We show through numerous experiments that the AFTR obtains SOTA performance in the nuScenes 3D object detection task (74.9% NDS and 73.2% mAP) and demonstrates strong robustness to misalignment (only a 0.2% NDS drop with slight noise). At the same time, we demonstrate the effectiveness of the AFTR components through ablation studies. In summary, the proposed AFTR is an accurate, efficient, and robust multi-sensor data fusion framework.
Funder
Key Project of National Nature Science Foundation of China
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference51 articles.
1. Self-Driving Cars: A City Perspective;Duarte;Sci. Robot.,2019 2. Is It Safe to Drive? An Overview of Factors, Metrics, and Datasets for Driveability Assessment in Autonomous Driving;Guo;IEEE Trans. Intell. Transport. Syst.,2020 3. Life and Death Decisions of Autonomous Vehicles;Bigman;Nature,2020 4. Vora, S., Lang, A.H., Helou, B., and Beijbom, O. (2020, January 13–19). PointPainting: Sequential Fusion for 3D Object Detection. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA. 5. Liu, Z., Tang, H., Amini, A., Yang, X., Mao, H., Rus, D.L., and Han, S. (June, January 29). BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s-Eye View Representation. Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|