AlphaZe∗∗: AlphaZero-like baselines for imperfect information games are surprisingly strong-Reference-Cited by-同舟云学术

AlphaZe∗∗: AlphaZero-like baselines for imperfect information games are surprisingly strong

Published:2023-05-12 Issue: Volume:6 Page:
ISSN:2624-8212
Container-title:Frontiers in Artificial Intelligence
language:
Short-container-title:Front. Artif. Intell.

Author:

Blüml Jannis,Czech Johannes,Kersting Kristian

Abstract

In recent years, deep neural networks for strategy games have made significant progress. AlphaZero-like frameworks which combine Monte-Carlo tree search with reinforcement learning have been successfully applied to numerous games with perfect information. However, they have not been developed for domains where uncertainty and unknowns abound, and are therefore often considered unsuitable due to imperfect observations. Here, we challenge this view and argue that they are a viable alternative for games with imperfect information—a domain currently dominated by heuristic approaches or methods explicitly designed for hidden information, such as oracle-based techniques. To this end, we introduce a novel algorithm based solely on reinforcement learning, called AlphaZe∗∗, which is an AlphaZero-based framework for games with imperfect information. We examine its learning convergence on the games Stratego and DarkHex and show that it is a surprisingly strong baseline, while using a model-based approach: it achieves similar win rates against other Stratego bots like Pipeline Policy Space Response Oracle (P2SRO), while not winning in direct comparison against P2SRO or reaching the much stronger numbers of DeepNash. Compared to heuristics and oracle-based approaches, AlphaZe∗∗ can easily deal with rule changes, e.g., when more information than usual is given, and drastically outperforms other approaches in this respect.

Funder

Hessisches Ministerium für Wissenschaft und Kunst

Publisher

Frontiers Media SA

Subject

Artificial Intelligence

Reference41 articles.

1. Dota 2 with large scale deep reinforcement learning;Berner;arXiv:1912.06680v1,2019

2. Combining prediction of human decisions with ISMCTS in imperfect information games;Bitan;arXiv preprint arXiv:1709.09451,2017

3. “A comparison of Monte-Carlo methods for phantom GO,”;Borsboom,2007

4. “Combining deep reinforcement learning and search for imperfect-information games,”;Brown;Advances in Neural Information Processing Systems, Vol. 33,2020

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Strategic Reparameterization for Enhanced Inference in Imperfect Information Games: A Neural Network Approach;Lecture Notes in Computer Science;2024

2. Efficiently Training Neural Networks for Imperfect Information Games by Sampling Information Sets;Lecture Notes in Computer Science;2024

3. Weighting Information Sets with Siamese Neural Networks in Reconnaissance Blind Chess;2023 IEEE Conference on Games (CoG);2023-08-21