A State-Compensated Deep Deterministic Policy Gradient Algorithm for UAV Trajectory Tracking-Reference-Cited by-同舟云学术

A State-Compensated Deep Deterministic Policy Gradient Algorithm for UAV Trajectory Tracking

Published:2022-06-21 Issue:7 Volume:10 Page:496
ISSN:2075-1702
Container-title:Machines
language:en
Short-container-title:Machines

Author:

Wu Jiying^ORCID,Yang Zhong,Liao Luwei,He Naifeng,Wang Zhiyong,Wang Can

Abstract

The unmanned aerial vehicle (UAV) trajectory tracking control algorithm based on deep reinforcement learning is generally inefficient for training in an unknown environment, and the convergence is unstable. Aiming at this situation, a Markov decision process (MDP) model for UAV trajectory tracking is established, and a state-compensated deep deterministic policy gradient (CDDPG) algorithm is proposed. An additional neural network (C-Net) whose input is compensation state and output is compensation action is added to the network model of a deep deterministic policy gradient (DDPG) algorithm to assist in network exploration training. It combined the action output of the DDPG network with compensated output of the C-Net as the output action to interact with the environment, enabling the UAV to rapidly track dynamic targets in the most accurate continuous and smooth way possible. In addition, random noise is added on the basis of the generated behavior to realize a certain range of exploration and make the action value estimation more accurate. The OpenAI Gym tool is used to verify the proposed method, and the simulation results show that: (1) The proposed method can significantly improve the training efficiency by adding a compensation network and effectively improve the accuracy and convergence stability; (2) Under the same computer configuration, the computational cost of the proposed algorithm is basically the same as that of the QAC algorithm (Actor-critic algorithm based on behavioral value Q) and the DDPG algorithm; (3) During the training process, with the same tracking accuracy, the learning efficiency is about 70% higher than that of QAC and DDPG; (4) During the simulation tracking experiment, under the same training time, the tracking error of the proposed method after stabilization is about 50% lower than that of QAC and DDPG.

Funder

Guizhou Provincial Science and Technology Projects

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Industrial and Manufacturing Engineering,Control and Optimization,Mechanical Engineering,Computer Science (miscellaneous),Control and Systems Engineering

Link

https://www.mdpi.com/2075-1702/10/7/496/pdf

Reference31 articles.

1. Learning agile and dynamic motor skills for legged robots

2. Feedback control for cassie with deep reinforcement learning;Xie;Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),2018

3. Decentralized PID neural network control for a quadrotor helicopter subjected to wind disturbance

4. Design of UAV UAV control system based on deep learning;Xu;Comput. Meas. Control,2020

5. Human-level control through deep reinforcement learning

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Tensor product model transformation-based reinforcement learning neural network controller with guaranteed stability;Neurocomputing;2024-12

2. A Supervised Reinforcement Learning Algorithm for Controlling Drone Hovering;Drones;2024-02-20

3. Time-attenuating Twin Delayed DDPG for Quadrotor Tracking Control;2023 42nd Chinese Control Conference (CCC);2023-07-24

4. Viewpoint planning with transition management for active object recognition;Frontiers in Neurorobotics;2023-02-24