Sample-efficient multi-agent reinforcement learning with masked reconstruction

Author:

Kim Jung In,Lee Young JaeORCID,Heo JongkookORCID,Park Jinhyeok,Kim JaehoonORCID,Lim Sae Rin,Jeong Jinyong,Kim Seoung BumORCID

Abstract

Deep reinforcement learning (DRL) is a powerful approach that combines reinforcement learning (RL) and deep learning to address complex decision-making problems in high-dimensional environments. Although DRL has been remarkably successful, its low sample efficiency necessitates extensive training times and large amounts of data to learn optimal policies. These limitations are more pronounced in the context of multi-agent reinforcement learning (MARL). To address these limitations, various studies have been conducted to improve DRL. In this study, we propose an approach that combines a masked reconstruction task with QMIX (M-QMIX). By introducing a masked reconstruction task as an auxiliary task, we aim to achieve enhanced sample efficiency—a fundamental limitation of RL in multi-agent systems. Experiments were conducted using the StarCraft II micromanagement benchmark to validate the effectiveness of the proposed method. We used 11 scenarios comprising five easy, three hard, and three very hard scenarios. We particularly focused on using a limited number of time steps for each scenario to demonstrate the improved sample efficiency. Compared to QMIX, the proposed method is superior in eight of the 11 scenarios. These results provide strong evidence that the proposed method is more sample-efficient than QMIX, demonstrating that it effectively addresses the limitations of DRL in multi-agent systems.

Funder

National Research Foundation of Korea

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference37 articles.

1. CHüttenrauch, M., Šošić, A., Neumann, G. Guided deep reinforcement learning for swarm systems. arXiv preprint arXiv:170906011. 2017;. https://doi.org/10.48550/arXiv.1709.06011

2. An overview of recent progress in the study of distributed multi-agent coordination;Yongcan Cao;IEEE Transactions on Industrial informatics,2012

3. Emergence of linguistic conventions in multi-agent reinforcement learning;D Lipowska;PLoS One,2018

4. Multiagent cooperation and competition with deep reinforcement learning;A Tampuu;PloS one,2017

5. Multiagent cooperation and competition with deep reinforcement learning;YJ Park;PloS one,2019

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3