Optimal Policy of Multiplayer Poker via Actor-Critic Reinforcement Learning-Reference-Cited by-同舟云学术

Optimal Policy of Multiplayer Poker via Actor-Critic Reinforcement Learning

Published:2022-05-30 Issue:6 Volume:24 Page:774
ISSN:1099-4300
Container-title:Entropy
language:en
Short-container-title:Entropy

Author:

Shi Daming^ORCID,Guo Xudong^ORCID,Liu Yi,Fan Wenhui

Abstract

Poker has been considered a challenging problem in both artificial intelligence and game theory because poker is characterized by imperfect information and uncertainty, which are similar to many realistic problems like auctioning, pricing, cyber security, and operations. However, it is not clear that playing an equilibrium policy in multi-player games would be wise so far, and it is infeasible to theoretically validate whether a policy is optimal. Therefore, designing an effective optimal policy learning method has more realistic significance. This paper proposes an optimal policy learning method for multi-player poker games based on Actor-Critic reinforcement learning. Firstly, this paper builds the Actor network to make decisions with imperfect information and the Critic network to evaluate policies with perfect information. Secondly, this paper proposes a novel multi-player poker policy update method: asynchronous policy update algorithm (APU) and dual-network asynchronous policy update algorithm (Dual-APU) for multi-player multi-policy scenarios and multi-player sharing-policy scenarios, respectively. Finally, this paper takes the most popular six-player Texas hold ’em poker to validate the performance of the proposed optimal policy learning method. The experiments demonstrate the policies learned by the proposed methods perform well and gain steadily compared with the existing approaches. In sum, the policy learning methods of imperfect information games based on Actor-Critic reinforcement learning perform well on poker and can be transferred to other imperfect information games. Such training with perfect information and testing with imperfect information models show an effective and explainable approach to learning an approximately optimal policy.

Publisher

MDPI AG

Subject

General Physics and Astronomy

Link

https://www.mdpi.com/1099-4300/24/6/774/pdf

Reference28 articles.

1. One Jump Ahead: Challenging Human Supremacy in Checkers

2. Deep Blue

3. Mastering the game of Go with deep neural networks and tree search

4. Mastering the game of Go without human knowledge

5. Computer Poker: A Review;Rubin,2011

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Reinforcement Learning Speed Control Algorithm Based on Giants Teaching Mechanism for JT9D Low Pressure Shaft;IECON 2023- 49th Annual Conference of the IEEE Industrial Electronics Society;2023-10-16

2. Kdb-D2CFR: Solving Multiplayer imperfect-information games with knowledge distillation-based DeepCFR;Knowledge-Based Systems;2023-07

3. Curriculum Reinforcement Learning Based on K-Fold Cross Validation;Entropy;2022-12-06