Decentralized multi-agent cooperation via adaptive partner modeling

Author:

Xu Chenhang,Wang Jia,Zhu XiaohuiORCID,Yue Yong,Zhou Weifeng,Liang Zhixuan,Wojtczak Dominik

Abstract

AbstractMulti-agent reinforcement learning encounters a non-stationary challenge, where agents concurrently update their policies, leading to changes in the environment. Existing approaches have tackled this challenge through communication among agents to obtain their partners’ actions, but this introduces computational complexity known as partner sample complexity. An alternative approach is to develop partner models that generate samples instead of direct communication to mitigate this complexity. However, a discrepancy arises between the real policies distribution and the policy of partner models, termed as model bias, which can significantly impact performance when heavily relying on partner models. In order to achieve a trade-off between sample complexity and performance, a novel multi-agent model-based reinforcement learning algorithm called decentralized adaptive partner modeling (DAPM) is proposed, which utilizes fictitious self play (FSP) to construct partner models and update policies. Model bias is addressed by establishing an upper bound to restrict the usage of partner models. Coupled with that, an adaptive rollout approach is introduced, enabling real agents to dynamically communicate with partner models based on their quality, ensuring that agent performance can progressively improve with partner model samples. The effectiveness of DAPM is exhibited in two multi-agent tasks, showing that DAPM outperforms existing model-free algorithms in terms of partner sample complexity and training stability. Specifically, DAPM requires 28.5% fewer communications compared to the best baseline and exhibits reduced fluctuations in the learning curve, indicating superior performance.

Funder

Suzhou Science and Technology Project

Research Development Fund of XJTLU

Key Programme Special Fund of XJTLU

Suzhou Municipal Key Laboratory for Intelligent Virtual Engineering

Publisher

Springer Science and Business Media LLC

Reference43 articles.

1. Brown N, Sandholm T (2019) Superhuman ai for multiplayer poker. Science 365:885–890

2. Vinyals M, Rodriguez-Aguilar JA, Cerquides J (2011) A survey on sensor networks from a multiagent perspective. Comput J 54(3):455–70

3. Zhou M, Luo J, Villella J, Yang Y, Rusu D, Miao J, Zhang W, Alban M, FADAKAR I, Chen Z, Huang C, Wen Y, Hassanzadeh K, Graves D, Zhu Z, Ni Y, Nguyen N, Elsayed M, Ammar H, Cowen-Rivers A, Ahilan S, Tian Z, Palenicek D, Rezaee K, Yadmellat P, Shao K, chen d, Zhang B, Zhang H, Hao J, Liu W, Wang J (2021) Smarts: an open-source scalable multi-agent rl training school for autonomous driving. In: Proceedings of the 2020 conference on robot learning, vol 155. PMLR, pp 264–285

4. Long P, Fan T, Liao X, Liu W, Zhang H, Pan J (2018) Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning. In: 2018 IEEE international conference on robotics and automation (ICRA). IEEE, pp 6252–6259

5. Tian Z, Zou S, Davies I, Warr T, Wu L, Ammar HB, Wang J (2020) Learning to communicate implicitly by actions. In: Proceedings of the AAAI conference on artificial intelligence, vol 34. AAAI Press, pp 7261–7268

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3