A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play-Reference-Cited by-同舟云学术

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

Published:2018-12-07 Issue:6419 Volume:362 Page:1140-1144
ISSN:0036-8075
Container-title:Science
language:en
Short-container-title:Science

Author:

Silver David¹²,Hubert Thomas¹,Schrittwieser Julian¹,Antonoglou Ioannis¹,Lai Matthew¹,Guez Arthur¹,Lanctot Marc¹,Sifre Laurent¹,Kumaran Dharshan¹^ORCID,Graepel Thore¹^ORCID,Lillicrap Timothy¹,Simonyan Karen¹,Hassabis Demis¹

Affiliation:

1. DeepMind, 6 Pancras Square, London N1C 4AG, UK.

2. University College London, Gower Street, London WC1E 6BT, UK.

Abstract

One program to rule them all Computers can beat humans at increasingly complex games, including chess and Go. However, these programs are typically constructed for a particular game, exploiting its properties, such as the symmetries of the board on which it is played. Silver et al. developed a program called AlphaZero, which taught itself to play Go, chess, and shogi (a Japanese version of chess) (see the Editorial, and the Perspective by Campbell). AlphaZero managed to beat state-of-the-art programs specializing in these three games. The ability of AlphaZero to adapt to various game rules is a notable step toward achieving a general game-playing system. Science , this issue p. 1140 ; see also pp. 1087 and 1118

Publisher

American Association for the Advancement of Science (AAAS)

Subject

Multidisciplinary

Reference41 articles.

1. Deep Blue

2. F.-H. Hsu Behind Deep Blue: Building the Computer That Defeated the World Chess Champion (Princeton Univ. 2002).

3. A STRATEGIC METAGAME PLAYER FOR GENERAL CHESS-LIKE GAMES

4. General game playing: overview of the AAAI competition;Genesereth M. R.;AI Mag.,2005

5. Some Studies in Machine Learning Using the Game of Checkers. II—Recent Progress

Cited by 1963 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Proposal and generation of endgame puzzles for an imperfect information game Geister;Entertainment Computing;2025-01

2. Misconduct in Post-Selections and Deep Learning;2023 8th International Conference on Control, Robotics and Cybernetics (CRC);2024-12-22

3. Generative design for complex floorplans in high-rise residential buildings: A Monte Carlo tree search-based self-organizing multi-agent system (MCTS-MAS) solution;Expert Systems with Applications;2024-12

4. Effect of Q-learning on the evolution of cooperation behavior in collective motion: An improved Vicsek model;Applied Mathematics and Computation;2024-12

5. Model-based offline reinforcement learning framework for optimizing tunnel boring machine operation;Underground Space;2024-12