An Improved Proximal Policy Optimization Method for Low-Level Control of a Quadrotor-Reference-Cited by-同舟云学术

An Improved Proximal Policy Optimization Method for Low-Level Control of a Quadrotor

Published:2022-04-06 Issue:4 Volume:11 Page:105
ISSN:2076-0825
Container-title:Actuators
language:en
Short-container-title:Actuators

Author:

Xue Wentao^ORCID,Wu Hangxing^ORCID,Ye Hui^ORCID,Shao Shuyi

Abstract

In this paper, a novel deep reinforcement learning algorithm based on Proximal Policy Optimization (PPO) is proposed to achieve the fixed point flight control of a quadrotor. The attitude and position information of the quadrotor is directly mapped to the PWM signals of the four rotors through neural network control. To constrain the size of policy updates, a PPO algorithm based on Monte Carlo approximations is proposed to achieve the optimal penalty coefficient. A policy optimization method with a penalized point probability distance can provide the diversity of policy by performing each policy update. The new proxy objective function is introduced into the actor–critic network, which solves the problem of PPO falling into local optimization. Moreover, a compound reward function is presented to accelerate the gradient algorithm along the policy update direction by analyzing various states that the quadrotor may encounter in the flight, which improves the learning efficiency of the network. The simulation tests the generalization ability of the offline policy by changing the wing length and payload of the quadrotor. Compared with the PPO method, the proposed method has higher learning efficiency and better robustness.

Funder

National Natural Science Foundation of China

Natural Science Foundation of the Jiangsu Higher Education Institutions of China

Publisher

MDPI AG

Subject

Control and Optimization,Control and Systems Engineering

Link

https://www.mdpi.com/2076-0825/11/4/105/pdf

Reference49 articles.

1. Effects of Touch, Voice, and Multimodal Input, and Task Load on Multiple-UAV Monitoring Performance During Simulated Manned-Unmanned Teaming in a Military Helicopter

2. Strawberry Maturity Classification from UAV and Near-Ground Imaging Using Deep Learning