An immediate-return reinforcement learning for the atypical Markov decision processes-Reference-Cited by-同舟云学术

An immediate-return reinforcement learning for the atypical Markov decision processes

Published:2022-12-13 Issue: Volume:16 Page:
ISSN:1662-5218
Container-title:Frontiers in Neurorobotics
language:
Short-container-title:Front. Neurorobot.

Author:

Pan Zebang,Wen Guilin,Tan Zhao,Yin Shan,Hu Xiaoyan

Abstract

The atypical Markov decision processes (MDPs) are decision-making for maximizing the immediate returns in only one state transition. Many complex dynamic problems can be regarded as the atypical MDPs, e.g., football trajectory control, approximations of the compound Poincaré maps, and parameter identification. However, existing deep reinforcement learning (RL) algorithms are designed to maximize long-term returns, causing a waste of computing resources when applied in the atypical MDPs. These existing algorithms are also limited by the estimation error of the value function, leading to a poor policy. To solve such limitations, this paper proposes an immediate-return algorithm for the atypical MDPs with continuous action space by designing an unbiased and low variance target Q-value and a simplified network framework. Then, two examples of atypical MDPs considering the uncertainty are presented to illustrate the performance of the proposed algorithm, i.e., passing the football to a moving player and chipping the football over the human wall. Compared with the existing deep RL algorithms, such as deep deterministic policy gradient and proximal policy optimization, the proposed algorithm shows significant advantages in learning efficiency, the effective rate of control, and computing resource usage.

Funder

National Natural Science Foundation of China

Publisher

Frontiers Media SA

Subject

Artificial Intelligence,Biomedical Engineering

Reference38 articles.

1. A Markovian decision process;Bellman;J. Mathem. Mech.,1957

2. Multi-objectivization and ensembles of shapings in reinforcement learning;Brys;Neurocomputing,2017

3. Deep reinforcement learning based trajectory planning under uncertain constraints;Chen;Front. Neurorob,2022

4. Reinforcement learning and the reward engineering principle;Dewey;2014 AAAI Spring Symposium Series,2014

5. Maximal sprinting speed of elite soccer players during training and matches;Djaoui;J. Strength Condit. Res,2017

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Designing Aquaculture Monitoring System Based on Data Fusion through Deep Reinforcement Learning (DRL);Electronics;2023-04-27