Online Planning for Large Markov Decision Processes with Hierarchical Decomposition-Reference-Cited by-同舟云学术

Online Planning for Large Markov Decision Processes with Hierarchical Decomposition

Published:2015-08-13 Issue:4 Volume:6 Page:1-28
ISSN:2157-6904
Container-title:ACM Transactions on Intelligent Systems and Technology
language:en
Short-container-title:ACM Trans. Intell. Syst. Technol.

Author:

Bai Aijun¹,Wu Feng¹,Chen Xiaoping¹

Affiliation:

1. University of Science and Technology of China, China

Abstract

Markov decision processes (MDPs) provide a rich framework for planning under uncertainty. However, exactly solving a large MDP is usually intractable due to the “curse of dimensionality”— the state space grows exponentially with the number of state variables. Online algorithms tackle this problem by avoiding computing a policy for the entire state space. On the other hand, since online algorithm has to find a near-optimal action online in almost real time, the computation time is often very limited. In the context of reinforcement learning, MAXQ is a value function decomposition method that exploits the underlying structure of the original MDP and decomposes it into a combination of smaller subproblems arranged over a task hierarchy. In this article, we present MAXQ-OP—a novel online planning algorithm for large MDPs that utilizes MAXQ hierarchical decomposition in online settings. Compared to traditional online planning algorithms, MAXQ-OP is able to reach much more deeper states in the search tree with relatively less computation time by exploiting MAXQ hierarchical decomposition online. We empirically evaluate our algorithm in the standard Taxi domain—a common benchmark for MDPs—to show the effectiveness of our approach. We have also conducted a long-term case study in a highly complex simulated soccer domain and developed a team named WrightEagle that has won five world champions and five runners-up in the recent 10 years of RoboCup Soccer Simulation 2D annual competitions. The results in the RoboCup domain confirm the scalability of MAXQ-OP to very large domains.

Funder

National Natural Science Foundation of China

Publisher

Association for Computing Machinery (ACM)

Subject

Artificial Intelligence,Theoretical Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/2717316

Reference53 articles.

1. Aijun Bai Feng Wu and Xiaoping Chen. 2013a. Bayesian mixture modelling and inference based Thompson sampling in Monte-Carlo tree search. In Advances in Neural Information Processing Systems 26. 1646--1654. Aijun Bai Feng Wu and Xiaoping Chen. 2013a. Bayesian mixture modelling and inference based Thompson sampling in Monte-Carlo tree search. In Advances in Neural Information Processing Systems 26. 1646--1654.

Cited by 23 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Solving nonstationary Markov decision processes via contextual decomposition: A military air battle management application;Expert Systems with Applications;2023-12

2. Health Technology Assessment ( HTA ) Using MAFEIP (Monitoring and Assessment Framework for the European Innovation Partnership on Active and Healthy Ageing);The International Encyclopedia of Health Communication;2022-09-29

3. Performance Study of Minimax and Reinforcement Learning Agents Playing the Turn-based Game Iwoki;Applied Artificial Intelligence;2021-06-15

4. Cost-Effectiveness Assessment of Internet of Things in Smart Cities;Frontiers in Digital Health;2021-05-24

5. Pricing-aware Real-time Charging Scheduling and Charging Station Expansion for Large-scale Electric Buses;ACM Transactions on Intelligent Systems and Technology;2021-02-28