The Role of a Reward in Shaping Multiple Football Agents’ Behavior: An Empirical Study-Reference-Cited by-同舟云学术

The Role of a Reward in Shaping Multiple Football Agents’ Behavior: An Empirical Study

Published:2023-03-12 Issue:6 Volume:13 Page:3622
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Kim So¹^ORCID,Kim Ji¹,Lee Jee¹²

Affiliation:

1. Graduate School of Artificial Intelligence and Informatics, Sangmyung University, Seoul 03016, Republic of Korea

2. Department of Human-Centered Artificial Intelligence, Sangmyung University, Seoul 03016, Republic of Korea

Abstract

In reinforcement learning (RL), a reward formed with a scalar value is seen as a sufficient means to guide an agent’s behavior. A reward drives an agent to seek out an optimal policy to solve a problem (or to achieve a goal) under uncertainty. In this paper, we aimed to probe the benefit of such a scalar reward in the shaping of coordination policy using artificial football scenarios. In a football setting, a team normally practices two types of strategies: one is a primary formation, that is, the default strategy of a team regardless of their opponents (e.g., 4-4-2, 4-3-3), and the other is an adaptive strategy, that is, a reactive tactic responding to the spontaneous changes of their opponents. We focused here on the primary formation as a team coordination policy that can be trained by a reward using multi-agent RL (MARL) algorithms. Once a team of multiple football agents has successfully learned a primary formation based on a reward-driven approach, we assumed that the team is able to exhibit the primary formation when facing various opponent teams they have never faced in due course to receive a reward. To precisely examine this behavior, we conducted a large number of simulations with twelve artificial football teams in an AI world cup environment. Here, we trained two MARL-based football teams with a team guided by a random walk formation. Afterwards, we performed the artificial football matches with the most competitive of the twelve teams that the MARL-based teams had never played against. Given the analyses of the performance of each football team with regard to their average score and competitiveness, the results showed that the proposed MARL teams outperformed the others with respect to competitiveness, although these teams were not the best with respect to the average score. This indicated that the coordination policy of the MARL-based football teams was moderately consistent against both known and unknown opponents due to the successful learning of a primary formation following the guidance of a scalar reward.

Funder

Sangmyung University

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/6/3622/pdf

Reference47 articles.

1. Human-level control through deep reinforcement learning;Mnih;Nature,2015

2. Mastering the game of Go with deep neural networks and tree search;Silver;Nature,2016

3. Sutton, R.S., and Barto, A.G. (1998). Reinforcement Learning: An Introduction, MIT Press.

4. Reward is enough;Silver;Artif. Intell.,2021

5. Algorithms for reinforcement learning;Synth. Lect. Artif. Intell. Mach. Learn.,2010

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Prediction of the Opponents Actions in Soccer Simulation based on Location of Players;2024 XXVII International Conference on Soft Computing and Measurements (SCM);2024-05-22