A Reinforcement Learning Method of Solving Markov Decision Processes: An Adaptive Exploration Model Based on Temporal Difference Error-Reference-Cited by-同舟云学术

A Reinforcement Learning Method of Solving Markov Decision Processes: An Adaptive Exploration Model Based on Temporal Difference Error

Published:2023-10-08 Issue:19 Volume:12 Page:4176
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Wang Xianjia¹²^ORCID,Yang Zhipeng¹³^ORCID,Chen Guici¹³,Liu Yanli¹³

Affiliation:

1. Hubei Province Key Laboratory of Systems Science in Metallurgical Process, Wuhan University of Science and Technology, Wuhan 430065, China

2. Economics and Management School, Wuhan University, Wuhan 430072, China

3. College of Science, Wuhan University of Science and Technology, Wuhan 430065, China

Abstract

Traditional backward recursion methods face a fundamental challenge in solving Markov Decision Processes (MDP), where there exists a contradiction between the need for knowledge of optimal expected payoffs and the inability to acquire such knowledge during the decision-making process. To address this challenge and strike a reasonable balance between exploration and exploitation in the decision process, this paper proposes a novel model known as Temporal Error-based Adaptive Exploration (TEAE). Leveraging reinforcement learning techniques, TEAE overcomes the limitations of traditional MDP solving methods. TEAE exhibits dynamic adjustment of exploration probabilities based on the agent’s performance, on the one hand. On the other hand, TEAE approximates the optimal expected payoff function for subprocesses after specific states and times by integrating deep convolutional neural networks to minimize the temporal difference error between the dual networks. Furthermore, the paper extends TEAE to DQN-PER and DDQN-PER methods, resulting in DQN-PER-TEAE and DDQN-PER-TEAE variants, which not only demonstrate the generality and compatibility of the TEAE model with existing reinforcement learning techniques but also validate the practicality and applicability of the proposed approach in a broader MDP reinforcement learning context. To further validate the effectiveness of TEAE, the paper conducts a comprehensive evaluation using multiple metrics, compares its performance with other MDP reinforcement learning methods, and conducts case studies. Ultimately, simulation results and case analyses consistently indicate that TEAE exhibits higher efficiency, highlighting its potential in driving advancements in the field.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/19/4176/pdf

Reference45 articles.

1. Recurrent prediction model for partially observable MDPs;Xie;Inf. Sci.,2023

2. Infinite horizon Markov decision processes with unknown or variable discount factors;White;Eur. J. Oper. Res.,1987

3. A Machine Learning–Enabled Partially Observable Markov Decision Process Framework for Early Sepsis Prediction;Liu;INFORMS J. Comput.,2022

4. Puterman, M.L. (2014). Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons.

5. Bellman, R., and Kalaba, R.E. (1965). Dynamic Programming and Modern Control Theory, Academic Press.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Playing Flappy Bird Based on Motion Recognition Using a Transformer Model and LIDAR Sensor;Sensors;2024-03-16

2. Mean Field Multi-Agent Reinforcement Learning Method for Area Traffic Signal Control;Electronics;2023-11-17