TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play-Reference-Cited by-同舟云学术

TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play

Published:1994-03 Issue:2 Volume:6 Page:215-219
ISSN:0899-7667
Container-title:Neural Computation
language:en
Short-container-title:Neural Computation

Author:

Tesauro Gerald¹

Affiliation:

1. IBM Thomas J. Watson Research Center, P. O. Box 704, Yorktown Heights, NY 10598 USA

Abstract

TD-Gammon is a neural network that is able to teach itself to play backgammon solely by playing against itself and learning from the results, based on the TD(λ) reinforcement learning algorithm (Sutton 1988). Despite starting from random initial weights (and hence random initial strategy), TD-Gammon achieves a surprisingly strong level of play. With zero knowledge built in at the start of learning (i.e., given only a “raw” description of the board state), the network learns to play at a strong intermediate level. Furthermore, when a set of hand-crafted features is added to the network's input representation, the result is a truly staggering level of performance: the latest version of TD-Gammon is now estimated to play at a strong master level that is extremely close to the world's best human players.

Publisher

MIT Press - Journals

Subject

Cognitive Neuroscience,Arts and Humanities (miscellaneous)

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.2.215

Reference6 articles.

1. Learning to predict by the methods of temporal differences

2. Neurogammon Wins Computer Olympiad

3. Practical issues in temporal difference learning

Cited by 378 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Reinforcement learning in reliability and maintenance optimization: A tutorial;Reliability Engineering & System Safety;2024-11

2. Introducing Tales of Tribute AI Competition;2024 IEEE Conference on Games (CoG);2024-08-05

3. Learning to express reward prediction error-like dopaminergic activity requires plastic representations of time;Nature Communications;2024-07-12

4. DDQNC-P: A framework for civil aircraft tactical synergetic trajectory planning under adverse weather conditions;Chinese Journal of Aeronautics;2024-07

5. Improving Zero-Shot Coordination with Diversely Rewarded Partner Agents;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30