Tjong: A transformer‐based Mahjong AI via hierarchical decision‐making and fan backward-Reference-Cited by-同舟云学术

Tjong: A transformer‐based Mahjong AI via hierarchical decision‐making and fan backward

Published:2024-03-21 Issue:4 Volume:9 Page:982-995
ISSN:2468-2322
Container-title:CAAI Transactions on Intelligence Technology
language:en
Short-container-title:CAAI Trans on Intel Tech

Author:

Li Xiali¹²^ORCID,Liu Bo¹²^ORCID,Wei Zhi³^ORCID,Wang Zhaoqi¹²^ORCID,Wu Licheng¹²^ORCID

Affiliation:

1. School of Information and Engineering Minzu University of China Beijing China

2. Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE Minzu University of China Beijing China

3. Department of Computer Science New Jersey Institute of Technology Newark New Jersey USA

Abstract

AbstractMahjong, a complex game with hidden information and sparse rewards, poses significant challenges. Existing Mahjong AIs require substantial hardware resources and extensive datasets to enhance AI capabilities. The authors propose a transformer‐based Mahjong AI (Tjong) via hierarchical decision‐making. By utilising self‐attention mechanisms, Tjong effectively captures tile patterns and game dynamics, and it decouples the decision process into two distinct stages: action decision and tile decision. This design reduces decision complexity considerably. Additionally, a fan backward technique is proposed to address the sparse rewards by allocating reversed rewards for actions based on winning hands. Tjong consists of 15M parameters and is trained using approximately 0.5 M data over 7 days of supervised learning on a single server with 2 GPUs. The action decision achieved an accuracy of 94.63%, while the claim decision attained 98.55% and the discard decision reached 81.51%. In a tournament format, Tjong outperformed AIs (CNN, MLP, RNN, ResNet, VIT), achieving scores up to 230% higher than its opponents. Furthermore, after 3 days of reinforcement learning training, it ranked within the top 1% on the leaderboard on the Botzone platform.

Funder

National Natural Science Foundation of China

Publisher

Institution of Engineering and Technology (IET)

Reference41 articles.

1. Computer poker: A review

2. Rong J. Qin T. An B.:Competitive Bridge Bidding with Deep Neural Networks(2019). arXiv May 05.https://doi.org/10.48550/arXiv.1903.00900

3. DeltaDou: Expert-level Doudizhu AI through Self-play

4. Perfectdou: dominating doudizhu with perfect information distillation;Yang G.;Adv. Neural Inf. Process. Syst.,2022