Counterfactual Multi-Agent Policy Gradients-Reference-Cited by-同舟云学术

Counterfactual Multi-Agent Policy Gradients

Published:2018-04-29 Issue:1 Volume:32 Page:
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Foerster Jakob,Farquhar Gregory,Afouras Triantafyllos,Nardelli Nantas,Whiteson Shimon

Abstract

Many real-world problems, such as network packet routing and the coordination of autonomous vehicles, are naturally modelled as cooperative multi-agent systems. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while keeping the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actor-critic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 510 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Centralised rehearsal of decentralised cooperation: Multi-agent reinforcement learning for the scalable coordination of residential energy flexibility;Applied Energy;2025-01

2. Decentralized Counterfactual Value with Threat Detection for Multi-Agent Reinforcement Learning in mixed cooperative and competitive environments;Expert Systems with Applications;2024-12

3. A sequential multi-agent reinforcement learning framework for different action spaces;Expert Systems with Applications;2024-12

4. A fast strategy-solving method for adversarial team games utilizing warm starting;Neurocomputing;2024-12

5. MuDE: Multi-agent decomposed reward-based exploration;Neural Networks;2024-11