Improving Agent Decision Payoffs via a New Framework of Opponent Modeling

Author:

Liu Chanjuan1,Cong Jinmiao1,Zhao Tianhao1,Zhu Enqiang2ORCID

Affiliation:

1. School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China

2. Institute of Computing Science and Technology, Guangzhou University, Guangzhou 510006, China

Abstract

The payoff of an agent depends on both the environment and the actions of other agents. Thus, the ability to model and predict the strategies and behaviors of other agents in an interactive decision-making scenario is one of the core functionalities in intelligent systems. State-of-the-art methods for opponent modeling mainly use an explicit model of opponents’ actions, preferences, targets, etc., that the primary agent uses to make decisions. It is more important for an agent to increase its payoff than to accurately predict opponents’ behavior. Therefore, we propose a framework synchronizing the opponent modeling and decision making of the primary agent by incorporating opponent modeling into reinforcement learning. For interactive decisions, the payoff depends not only on the behavioral characteristics of the opponent but also the current state. However, confounding the two obscures the effects of state and action, which then cannot be accurately encoded. To this end, state evaluation is separated from action evaluation in our model. The experimental results from two game environments, a simulated soccer game and a real game called quiz bowl, show that the introduction of opponent modeling can effectively improve decision payoffs. In addition, the proposed framework for opponent modeling outperforms benchmark models.

Funder

Natural Science Foundation of Guangdong Province of China

Natural Science Foundation of Liaoning Province of China

Publisher

MDPI AG

Subject

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Reference46 articles.

1. Reinforcement Learning With Task Decomposition for Cooperative Multiagent Systems;Sun;IEEE Trans. Neural Netw. Learn. Syst.,2021

2. Mastering the game of Go with deep neural networks and tree search;Silver;Nature,2016

3. Ferreira, J.C.L. (2012). Opponent Modelling in Texas Hold’em: Learning Pre-Flop Strategies in Multiplayer Tables. [Master’s Thesis, University of Porto].

4. XCS with opponent modelling for concurrent reinforcement learners;Chen;Neurocomputing,2020

5. Formation Control With Collision Avoidance Through Deep Reinforcement Learning Using Model-Guided Demonstration;Sui;IEEE Trans. Neural Netw. Learn. Syst.,2021

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3