Benchmarking Perception to Streaming Inputs in Vision-Centric Autonomous Driving
-
Published:2023-12-16
Issue:24
Volume:11
Page:4976
-
ISSN:2227-7390
-
Container-title:Mathematics
-
language:en
-
Short-container-title:Mathematics
Author:
Jin Tianshi1ORCID, Ding Weiping1, Yang Mingliang1, Zhu Honglin1ORCID, Dai Peisong1
Affiliation:
1. School of Mechanical Engineering, Southwest Jiaotong University, Chengdu 610031, China
Abstract
In recent years, vision-centric perception has played a crucial role in autonomous driving tasks, encompassing functions such as 3D detection, map construction, and motion forecasting. However, the deployment of vision-centric approaches in practical scenarios is hindered by substantial latency, often deviating significantly from the outcomes achieved through offline training. This disparity arises from the fact that conventional benchmarks for autonomous driving perception predominantly conduct offline evaluations, thereby largely overlooking the latency concerns prevalent in real-world deployment. Although a few benchmarks have been proposed to address this limitation by introducing effective evaluation methods for online perception, they do not adequately consider the intricacies introduced by the complexity of input information streams. To address this gap, we propose the Autonomous driving Streaming I/O (ASIO) benchmark, aiming to assess the streaming input characteristics and online performance of vision-centric perception in autonomous driving. To facilitate this evaluation across diverse streaming inputs, we initially establish a dataset based on the CARLA Leaderboard. In alignment with real-world deployment considerations, we further develop evaluation metrics based on information complexity specifically tailored for streaming inputs and streaming performance. Experimental results indicate significant variations in model performance and ranking under different major camera deployments, underscoring the necessity of thoroughly accounting for the influences of model latency and streaming input characteristics during real-world deployment. To enhance streaming performance consistently across distinct streaming input features, we introduce a backbone switcher based on the identified streaming input characteristics. Experimental validation demonstrates its efficacy in perpetually improving streaming performance across varying streaming input features.
Funder
Natural Science Foundation of Sichuan Province SWJTU Science and Technology Innovation Project
Subject
General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)
Reference58 articles.
1. Huang, J., Huang, G., Zhu, Z., Ye, Y., and Du, D. (2021). Bevdet: High-performance multi-camera 3d object detection in bird-eye-view. arXiv. 2. Huang, J., and Huang, G. (2022). Bevdet4d: Exploit temporal cues in multi-camera 3d object detection. arXiv. 3. Li, Y., Bao, H., Ge, Z., Yang, J., Sun, J., and Li, Z. (2023, January 7–14). Bevstereo: Enhancing depth estimation in multi-view 3d object detection with temporal stereo. Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA. 4. Li, Y., Ge, Z., Yu, G., Yang, J., Wang, Z., Shi, Y., Sun, J., and Li, Z. (2023, January 7–14). Bevdepth: Acquisition of reliable depth for multi-view 3d object detection. Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA. 5. Li, Z., Wang, W., Li, H., Xie, E., Sima, C., Lu, T., Qiao, Y., and Dai, J. (2022, January 23–27). Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|