Decoupled Monte Carlo Tree Search for Cooperative Multi-Agent Planning-Reference-Cited by-同舟云学术

Decoupled Monte Carlo Tree Search for Cooperative Multi-Agent Planning

Published:2023-02-02 Issue:3 Volume:13 Page:1936
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Asik Okan¹^ORCID,Aydemir Fatma Başak¹^ORCID,Akın Hüseyin Levent¹

Affiliation:

1. Department of Computer Engineering, Bogazici University, Istanbul 34342, Turkey

Abstract

The number of agents exponentially increases the complexity of a cooperative multi-agent planning problem. Decoupled planning is one of the viable approaches to reduce this complexity. By integrating decoupled planning with Monte Carlo Tree Search, we present a new scalable planning approach. The search tree maintains the updates of the individual actions of each agent separately. However, this separation brings coordination and action synchronization problems. When the agent does not know the action of the other agent, it uses the returned reward to deduce the desirability of its action. When a deterministic action selection policy is used in the Monte Carlo Tree Search algorithm, the actions of agents are synchronized. Of all possible action combinations, only some of them are evaluated. We show the effect of action synchronization on different problems and propose stochastic action selection policies. We also propose a combined method as a pruning step in centralized planning to address the coordination problem in decoupled planning. We create a centralized search tree with a subset of joint actions selected by the evaluation of decoupled planning. We empirically show that decoupled planning has a similar performance compared to a central planning algorithm when stochastic action selection is used in repeated matrix games and multi-agent planning problems. We also show that the combined method improves the performance of the decoupled method in different problems. We compare the proposed method to a decoupled method in regard to a warehouse commissioning problem. Our method achieved more than 10% improvement in performance.

Funder

the Turkish Directorate of Strategy and Budget under the TAM Project

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/3/1936/pdf

Reference40 articles.

1. The dynamics of reinforcement learning in cooperative multiagent systems;Claus;AAAI/IAAI,1998

2. Simulation-Based Approach to General Game Playing;Finnsson;AAAI,2008

3. Shafiei, M., Sturtevant, N., and Schaeffer, J. (2009, January 11–27). Comparing UCT versus CFR in simultaneous games. Proceedings of the IJCAI-09 Workshop on General Game Playing (GIGA’09), Pasadena, CA, USA.

4. Teytaud, O., and Flory, S. (2011, January 27–29). Upper confidence trees with short term partial information. Proceedings of the European Conference on the Applications of Evolutionary Computation, Torino, Italy.

5. Auger, D. (2011, January 27–29). Multiple tree for partially observable monte-carlo tree search. Proceedings of the European Conference on the Applications of Evolutionary Computation, Torino, Italy.