Polygames: Improved zero learning
-
Published:2021-01-11
Issue:4
Volume:42
Page:244-256
-
ISSN:2468-2438
-
Container-title:ICGA Journal
-
language:
-
Short-container-title:ICG
Author:
Cazenave Tristan1, Chen Yen-Chi2, Chen Guan-Wei3, Chen Shi-Yu3, Chiu Xian-Dong3, Dehos Julien4, Elsa Maria3, Gong Qucheng5, Hu Hengyuan5, Khalidov Vasil5, Li Cheng-Ling3, Lin Hsin-I3, Lin Yu-Jin3, Martinet Xavier5, Mella Vegard5, Rapin Jeremy5, Roziere Baptiste5, Synnaeve Gabriel5, Teytaud Fabien4, Teytaud Olivier5, Ye Shi-Cheng3, Ye Yi-Jun3, Yen Shi-Jim3, Zagoruyko Sergey5
Affiliation:
1. LAMSADE, University Paris-Dauphine, PSL, France 2. National Taiwan Normal University, Taiwan 3. AILAB, Dong Hwa University, Taiwan 4. University Littoral Cote d’Opale, France 5. Facebook AI Research, France and United States
Abstract
Since DeepMind’s AlphaZero, Zero learning quickly became the state-of-the-art method for many board games. It can be improved using a fully convolutional structure (no fully connected layer). Using such an architecture plus global pooling, we can create bots independent of the board size. The training can be made more robust by keeping track of the best checkpoints during the training and by training against them. Using these features, we release Polygames, our framework for Zero learning, with its library of games and its checkpoints. We won against strong humans at the game of Hex in 19 × 19, including the human player with the best ELO rank on LittleGolem; we incidentally also won against another Zero implementation, which was weaker than humans: in a discussion on LittleGolem, Hex19 was said to be intractable for zero learning. We also won in Havannah with size 8: win against the strongest player, namely Eobllor, with excellent opening moves. We also won several first places at the TAAI 2019 competitions and had positive results against strong bots in various games.
Subject
Computer Graphics and Computer-Aided Design,Human-Computer Interaction,Computational Mechanics,Computer Science (miscellaneous)
Reference11 articles.
1. The frontier of decidability in partially observable recursive games;Auger;International Journal of Foundations of Computer Science,2012 2. On the complexity of connection games;Bonnet;Theor. Comput. Sci.,2016 3. Buffet, O., Lee, C.-S., Lin, W. & Teytaud, O. (2012). Optimistic heuristics for MineSweeper. In International Computer Symposium, Hualien, Taiwan. https://hal.inria.fr/hal-00750577. 4. Coulom, R. (2007). Efficient selectivity and backup operators in Monte-Carlo tree search. In Proceedings of the 5th International Conference on Computers and Games. CG’06 (pp. 72–83). Berlin, Heidelberg: Springer. 5. Bandit Based Monte-Carlo Planning
Cited by
18 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|