Predicting Maps Using In-Vehicle Cameras for Data-Driven Intelligent Transport
-
Published:2023-12-15
Issue:24
Volume:12
Page:5017
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Ma Zhiguo1, Zhang Yutong2ORCID, Han Meng1
Affiliation:
1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310058, China 2. Innovation Center for Smart Medical Technologies & Devices, Binjiang Institute of Zhejiang University, Hangzhou 310053, China
Abstract
Bird’s eye view (BEV) semantic maps have evolved into a crucial element of urban intelligent traffic management and monitoring, offering invaluable visual and significant data representations for informed intelligent city decision making. Nevertheless, current methodologies continue underutilizing the temporal information embedded within dynamic frames throughout the BEV feature transformation process. This limitation results in decreased accuracy when mapping high-speed moving objects, particularly in capturing their shape and dynamic trajectory. A framework is proposed for cross-view semantic segmentation to address this challenge, leveraging simulated environments as a starting point before applying it to real-life urban imaginative transportation scenarios. The view converter module is thoughtfully designed to collate information from multiple initial view observations captured from various angles and modes. This module outputs a top-down view semantic graph characterized by its object space layout to preserve beneficial temporal information in BEV transformation. The NuScenes dataset is used to evaluate model effectiveness. A novel application is also devised that harnesses transformer networks to map images and video sequences into top-down or comprehensive bird’s-eye views. By combining physics-based and constraint-based formulations and conducting ablation studies, the approach has been substantiated, highlighting the significance of context above and below a given point in generating these maps. This innovative method has been thoroughly validated on the NuScenes dataset. Notably, it has yielded state-of-the-art instantaneous mapping results, with particular benefits observed for smaller dynamic category displays. The experimental findings include comparing axial attention with the state-of-the-art (SOTA) model, demonstrating the performance enhancement associated with temporal awareness.
Funder
National Natural Science Foundation of China
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference46 articles.
1. Qiu, H., Liu, X., Rallapalli, S., Bency, A.J., Chan, K., Urgaonkar, R., Manjunath, B.S., and Govindan, R.K. (2018, January 17–20). Kestrel: Video analytics for augmented multi-camera vehicle tracking. Proceedings of the 2018 IEEE/ACM Third International Conference on Internet-of-Things Design and Implementation (IoTDI), Orlando, FL, USA. 2. Xiong, X., Liu, Y., Yuan, T., Wang, Y., Wang, Y., and Zhao, H. (2023, January 18–22). Neural map prior for autonomous driving. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada. 3. Xu, Z., Liu, Y., Sun, Y., Liu, M., and Wang, L. (2022). Road lane centerline graph detection with vehicle-mounted sensors by transformer for high-definition map creation. arXiv. 4. Extending reliability of mmwave radar tracking and detection via fusion with camera;Zhang;IEEE Access,2019 5. Ng, M.H., Radia, K., Chen, J., Wang, D., Gog, I., and Gonzalez, J.E. (2020). Bird’s eye view semantic segmentation using geometry and semantic point cloud. arXiv.
|
|