Author:
Yuan Yuyu,Guo Ting,Zhao Pengqian,Jiang Hongpu
Abstract
Social dilemmas have guided research on mutual cooperation for decades, especially the two-person social dilemma. Most famously, Tit-for-Tat performs very well in tournaments of the Prisoner’s Dilemma. Nevertheless, they treat the options to cooperate or defect only as an atomic action, which cannot satisfy the complexity of the real world. In recent research, these options to cooperate or defect were temporally extended. Here, we propose a novel adherence-based multi-agent reinforcement learning algorithm for achieving cooperation and coordination by rewarding agents who adhere to other agents. The evaluation of adherence is based on counterfactual reasoning. During training, each agent observes the changes in the actions of other agents by replacing its current action, thereby calculating the degree of adherence of other agents to its behavior. Using adherence as an intrinsic reward enables agents to consider the collective, thus promoting cooperation. In addition, the adherence rewards of all agents are calculated in a decentralized way. We experiment in sequential social dilemma environments, and the results demonstrate the potential for the algorithm to enhance cooperation and coordination and significantly increase the scores of the deep RL agents.
Funder
National Natural Science Foundation of China
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference48 articles.
1. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems
2. Deep reinforcement learning for autonomous driving;Wang;arXiv,2018
3. Leveraging procedural generation to benchmark reinforcement learning;Cobbe;Proceedings of the International Conference on Machine Learning, PMLR,2020
4. Data efficient reinforcement learning for legged robots;Yang;Proceedings of the Conference on Robot Learning, PMLR,2020
5. A utility-based matching mechanism for stable and optimal resource allocation in cloud manufacturing platforms using deferred acceptance algorithm
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献