Towards Robust Decision-Making for Autonomous Highway Driving Based on Safe Reinforcement Learning-Reference-Cited by-同舟云学术

Towards Robust Decision-Making for Autonomous Highway Driving Based on Safe Reinforcement Learning

Published:2024-06-26 Issue:13 Volume:24 Page:4140
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Zhao Rui¹^ORCID,Chen Ziguo¹,Fan Yuze¹^ORCID,Li Yun²^ORCID,Gao Fei³^ORCID

Affiliation:

1. College of Automotive Engineering, Jilin University, Changchun 130025, China

2. Graduate School of Information and Science Technology, The University of Tokyo, Tokyo 113-8654, Japan

3. State Key Laboratory of Automotive Simulation and Control, Jilin University, Changhun 130025, China

Abstract

Reinforcement Learning (RL) methods are regarded as effective for designing autonomous driving policies. However, even when RL policies are trained to convergence, ensuring their robust safety remains a challenge, particularly in long-tail data. Therefore, decision-making based on RL must adequately consider potential variations in data distribution. This paper presents a framework for highway autonomous driving decisions that prioritizes both safety and robustness. Utilizing the proposed Replay Buffer Constrained Policy Optimization (RECPO) method, this framework updates RL strategies to maximize rewards while ensuring that the policies always remain within safety constraints. We incorporate importance sampling techniques to collect and store data in a Replay buffer during agent operation, allowing the reutilization of data from old policies for training new policy models, thus mitigating potential catastrophic forgetting. Additionally, we transform the highway autonomous driving decision problem into a Constrained Markov Decision Process (CMDP) and apply our proposed RECPO for training, optimizing highway driving policies. Finally, we deploy our method in the CARLA simulation environment and compare its performance in typical highway scenarios against traditional CPO, current advanced strategies based on Deep Deterministic Policy Gradient (DDPG), and IDM + MOBIL (Intelligent Driver Model and the model for minimizing overall braking induced by lane changes). The results show that our framework significantly enhances model convergence speed, safety, and decision-making stability, achieving a zero-collision rate in highway autonomous driving.

Funder

National Science Foundation of China

Publisher

MDPI AG

Link

https://www.mdpi.com/1424-8220/24/13/4140/pdf

Reference52 articles.

1. Cui, G., Zhang, W., Xiao, Y., Yao, L., and Fang, Z. (2022). Cooperative Perception Technology of Autonomous Driving in the Internet of Vehicles Environment: A Review. Sensors, 22.

2. Shan, M., Narula, K., Wong, Y.F., Worrall, S., Khan, M., Alexander, P., and Nebot, E. (2021). Demonstrations of Cooperative Perception: Safety and Robustness in Connected and Automated Vehicle Operations. Sensors, 21.

3. Schiegg, F.A., Llatser, I., Bischoff, D., and Volk, G. (2021). Collective Perception: A Safety Perspective. Sensors, 21.

4. Xiao, W., Mehdipour, N., Collin, A., Bin-Nun, A., Frazzoli, E., Duintjer Tebbens, R., and Belta, C. (2021). Rule-based Optimal Control for Autonomous Driving. arXiv.

5. Collin, A., Bilka, A., Pendleton, S., and Duintjer Tebbens, R. (2021). Safety of the Intended Driving Behavior Using Rulebooks. arXiv.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An Improved Deep Deterministic Policy Gradient Pantograph Active Control Strategy for High-Speed Railways;Electronics;2024-09-06