AFTR: A Robustness Multi-Sensor Fusion Model for 3D Object Detection Based on Adaptive Fusion Transformer-Reference-Cited by-同舟云学术

AFTR: A Robustness Multi-Sensor Fusion Model for 3D Object Detection Based on Adaptive Fusion Transformer

Published:2023-10-12 Issue:20 Volume:23 Page:8400
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Zhang Yan¹,Liu Kang¹^ORCID,Bao Hong²,Qian Xu¹,Wang Zihan¹,Ye Shiqing¹,Wang Weicen¹

Affiliation:

1. School of Artificial Intelligence, China University of Mining and Technology-Beijing, Beijing 100083, China

2. College of Robotics, Beijing Union University, Beijing 100027, China

Abstract

Multi-modal sensors are the key to ensuring the robust and accurate operation of autonomous driving systems, where LiDAR and cameras are important on-board sensors. However, current fusion methods face challenges due to inconsistent multi-sensor data representations and the misalignment of dynamic scenes. Specifically, current fusion methods either explicitly correlate multi-sensor data features by calibrating parameters, ignoring the feature blurring problems caused by misalignment, or find correlated features between multi-sensor data through global attention, causing rapidly escalating computational costs. On this basis, we propose a transformer-based end-to-end multi-sensor fusion framework named the adaptive fusion transformer (AFTR). The proposed AFTR consists of the adaptive spatial cross-attention (ASCA) mechanism and the spatial temporal self-attention (STSA) mechanism. Specifically, ASCA adaptively associates and interacts with multi-sensor data features in 3D space through learnable local attention, alleviating the problem of the misalignment of geometric information and reducing computational costs, and STSA interacts with cross-temporal information using learnable offsets in deformable attention, mitigating displacements due to dynamic scenes. We show through numerous experiments that the AFTR obtains SOTA performance in the nuScenes 3D object detection task (74.9% NDS and 73.2% mAP) and demonstrates strong robustness to misalignment (only a 0.2% NDS drop with slight noise). At the same time, we demonstrate the effectiveness of the AFTR components through ablation studies. In summary, the proposed AFTR is an accurate, efficient, and robust multi-sensor data fusion framework.

Funder

Key Project of National Nature Science Foundation of China

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/23/20/8400/pdf

Reference51 articles.

1. Self-Driving Cars: A City Perspective;Duarte;Sci. Robot.,2019

2. Is It Safe to Drive? An Overview of Factors, Metrics, and Datasets for Driveability Assessment in Autonomous Driving;Guo;IEEE Trans. Intell. Transport. Syst.,2020

3. Life and Death Decisions of Autonomous Vehicles;Bigman;Nature,2020

4. Vora, S., Lang, A.H., Helou, B., and Beijbom, O. (2020, January 13–19). PointPainting: Sequential Fusion for 3D Object Detection. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.

5. Liu, Z., Tang, H., Amini, A., Yang, X., Mao, H., Rus, D.L., and Han, S. (June, January 29). BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s-Eye View Representation. Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Glaucoma detection model by exploiting multi-region and multi-scan-pattern OCT images with dynamical region score;Biomedical Optics Express;2024-02-02