Multi-Agent Planning under Uncertainty with Monte Carlo Q-Value Function
-
Published:2019-04-04
Issue:7
Volume:9
Page:1430
-
ISSN:2076-3417
-
Container-title:Applied Sciences
-
language:en
-
Short-container-title:Applied Sciences
Author:
Zhang Jian,Pan Yaozong,Wang Ruili,Fang Yuqiang,Yang Haitao
Abstract
Decentralized partially observable Markov decision processes (Dec-POMDPs) are general multi-agent models for planning under uncertainty, but are intractable to solve. Doubly exponential growth of the search space as the horizon increases makes a brute-force search impossible. Heuristic methods can guide the search towards the right direction quickly and have been successful in different domains. In this paper, we propose a new Q-value function representation—Monte Carlo Q-value function , which is proved to be an upper bound of the optimal Q-value function . We introduce two Monte Carlo tree search enhancements—heavy playout for a simulation policy and adaptive samples—to speed up computation of . Then, we present a clustering and expansion with Monte-Carlo algorithm (CEMC)—an offline planning algorithm using as Q-value function, which is based on the generalized multi-agent A* with incremental clustering and expansion (GMAA*-ICE or ICE). CEMC calculates Q-value functions as required, without computing and storing all Q-value functions. An extended policy pruning strategy is used in CEMC. Finally, we present empirical results demonstrating that CEMC outperforms the best heuristic algorithm with a compact Q-value presentation in term of runtime for the same horizon, and has less memory usage for larger problems.
Funder
Space Engineering University
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science