On Q-learning Convergence for Non-Markov Decision Processes-Reference-Cited by-同舟云学术

On Q-learning Convergence for Non-Markov Decision Processes

Published:2018-07 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
language:
Short-container-title:

Author:

Majeed Sultan Javed¹,Hutter Marcus¹

Affiliation:

1. Research School of Computer Science, Australian National University

Abstract

Temporal-difference (TD) learning is an attractive, computationally efficient framework for model- free reinforcement learning. Q-learning is one of the most widely used TD learning technique that enables an agent to learn the optimal action-value function, i.e. Q-value function. Contrary to its widespread use, Q-learning has only been proven to converge on Markov Decision Processes (MDPs) and Q-uniform abstractions of finite-state MDPs. On the other hand, most real-world problems are inherently non-Markovian: the full true state of the environment is not revealed by recent observations. In this paper, we investigate the behavior of Q-learning when applied to non-MDP and non-ergodic domains which may have infinitely many underlying states. We prove that the convergence guarantee of Q-learning can be extended to a class of such non-MDP problems, in particular, to some non-stationary domains. We show that state-uniformity of the optimal Q-value function is a necessary and sufficient condition for Q-learning to converge even in the case of infinitely many internal states.

Publisher

International Joint Conferences on Artificial Intelligence Organization

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A controlling estimation bias method: Max_Mix_Min estimator for Q-learning;The Journal of Supercomputing;2024-05-26

2. Multi-view reinforcement learning for sequential decision-making with insufficient state information;International Journal of Machine Learning and Cybernetics;2023-10-24

3. Semi-Lipschitz functions and machine learning for discrete dynamical systems on graphs;Machine Learning;2022-03-23

4. Data Analytics of a Honeypot System Based on a Markov Decision Process Model;Recent Trends and Advances in Model Based Systems Engineering;2022

5. Non-Markovian Reinforcement Learning using Fractional Dynamics;2021 60th IEEE Conference on Decision and Control (CDC);2021-12-14