Query-Informed Multi-Agent Motion Prediction
Author:
Guo Chong12ORCID, Fan Shouyi1, Chen Chaoyi1, Zhao Wenbo3, Wang Jiawei1, Zhang Yao1, Chen Yanhong1ORCID
Affiliation:
1. College of Automotive Engineering, Jilin University, Changchun 130025, China 2. Changsha Automobile Innovation Research Institute, Changsha 410005, China 3. FAW Car Co., Ltd., Changchun 130015, China
Abstract
In a dynamic environment, autonomous driving vehicles require accurate decision-making and trajectory planning. To achieve this, autonomous vehicles need to understand their surrounding environment and predict the behavior and future trajectories of other traffic participants. In recent years, vectorization methods have dominated the field of motion prediction due to their ability to capture complex interactions in traffic scenes. However, existing research using vectorization methods for scene encoding often overlooks important physical information about vehicles, such as speed and heading angle, relying solely on displacement to represent the physical attributes of agents. This approach is insufficient for accurate trajectory prediction models. Additionally, agents’ future trajectories can be diverse, such as proceeding straight or making left or right turns at intersections. Therefore, the output of trajectory prediction models should be multimodal to account for these variations. Existing research has used multiple regression heads to output future trajectories and confidence, but the results have been suboptimal. To address these issues, we propose QINET, a method for accurate multimodal trajectory prediction for all agents in a scene. In the scene encoding part, we enhance the feature attributes of agent vehicles to better represent the physical information of agents in the scene. Our scene representation also possesses rotational and spatial invariance. In the decoder part, we use cross-attention and induce the generation of multimodal future trajectories by employing a self-learned query matrix. Experimental results demonstrate that QINET achieves state-of-the-art performance on the Argoverse motion prediction benchmark and is capable of fast multimodal trajectory prediction for multiple agents.
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference42 articles.
1. Chai, Y., Sapp, B., and Bansal, M. (2019, January 6–9). MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction. Proceedings of the Robot Learning (CoRL), New Orleans, LA, USA. 2. Liang, M., Yang, B., and Hu, R. (2020, January 15–16). Learning lane graph representations for motion forecasting. Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel. 3. Mercat, J., Gilles, T., and El Zoghby, N. (June, January 31). Multi-head attention for multi-modal joint vehicle motion forecasting. Proceedings of the International Conference on Robotics and Automation (ICRA), Paris, France. 4. Ye, M., Cao, T., and Chen, Q. (2021, January 19–25). Tpcn: Temporal point cloud networks for motion forecasting. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA. 5. Cui, H., Radosavljevic, V., and Chou, F.-C. (2019, January 20–24). Multimodal trajectory predictions for autonomous driving using deep convolutional networks. Proceedings of the International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada.
|
|