Actor-critic objective penalty function method: an adaptive strategy for trajectory tracking in autonomous driving-Reference-Cited by-同舟云学术

Actor-critic objective penalty function method: an adaptive strategy for trajectory tracking in autonomous driving

Published:2023-09-30 Issue: Volume: Page:
ISSN:2199-4536
Container-title:Complex & Intelligent Systems
language:en
Short-container-title:Complex Intell. Syst.

Author:

Wang Bo,Bai Fusheng^ORCID,Zhang Ke

Abstract

AbstractTrajectory tracking is a key technology for controlling the autonomous vehicles effectively and stably to track the reference trajectory. How to handle the various constraints in trajectory tracking is very challenging. The recently proposed generalized exterior point method (GEP) shows high computational efficiency and closed-loop performance in solving the constrained trajectory tracking problem. However, the neural networks used in the GEP may suffer from the ill-conditioning issue during model training, which result in a slow or even non-converging training convergence process and the control output of the policy network being suboptimal or even severely constraint-violating. To effectively deal with the large-scale nonlinear state-wise constraints and avoid the ill-conditioning issue, we propose a model-based reinforcement learning (RL) method called the actor-critic objective penalty function method (ACOPFM) for trajectory tracking in autonomous driving. We adopt an integrated decision and control (IDC)-based planning and control scheme to transform the trajectory tracking problem into MPC-based nonlinear programming problems and embed the objective penalty function method into an actor-critic solution framework. The nonlinear programming problem is transformed into an unconstrained optimization problem and employed as a loss function for model updating of the policy network, and the ill-conditioning issue is avoided by alternately performing gradient descent and adaptively adjusting the penalty parameter. The convergence of ACOPFM is proved. The simulation results demonstrate that the ACOPFM converges to the optimal control strategy fast and steadily, and perform well under the multi-lane test scenario.

Funder

Chongqing Science and Technology Commission

Publisher

Springer Science and Business Media LLC

Subject

Computational Mathematics,Engineering (miscellaneous),Information Systems,Artificial Intelligence

Link

https://link.springer.com/content/pdf/10.1007/s40747-023-01238-6.pdf

Reference65 articles.

1. Badue Claudine, Guidolini Rânik, Carneiro Raphael Vivacqua, Azevedo Pedro, Cardoso Vinicius B, Forechi Avelino, Jesus Luan, Berriel Rodrigo, Paixao Thiago M, Mutz Filipe, et al (2021) Self-driving cars: A survey. Expert Systems with Applications, 165:113816

2. González D, Pérez J, Milanés V, Nashashibi F (2015) A review of motion planning techniques for automated vehicles. IEEE Trans Intell Transp Syst 17(4):1135–1145

3. Huang Z, Li H, Li W, Liu J, Huang C, Yang Z, Fang W (2021) A new trajectory tracking algorithm for autonomous vehicles based on model predictive control. Sensors 21(21):7165

4. Chatzikomis C, Sorniotti A, Gruber P, Zanchetta M, Willans D, Balcombe B (2018) Comparison of path tracking and torque-vectoring controllers for autonomous electric vehicles. IEEE Transactions on Intelligent Vehicles 3(4):559–570

5. Li L, Li J, Zhang S (2021) Review article: State-of-the-art trajectory tracking of autonomous vehicles. Mechanical Sciences 12(1):419–432