Constraint‐based multi‐agent reinforcement learning for collaborative tasks-Reference-Cited by-同舟云学术

Constraint‐based multi‐agent reinforcement learning for collaborative tasks

Published:2023-05 Issue:3-4 Volume:34 Page:
ISSN:1546-4261
Container-title:Computer Animation and Virtual Worlds
language:en
Short-container-title:Computer Animation & Virtual

Author:

Shang Xiumin¹^ORCID,Xu Tengyu²,Karamouzas Ioannis³,Kallmann Marcelo¹

Affiliation:

1. Department of Computer Science, School of Engineering University of California Merced Merced California USA

2. Meta Platform Inc. Menlo Park California USA

3. School of Computing Clemson University Clemson South Carolina USA

Abstract

AbstractIn order to be successfully executed, collaborative tasks performed by two agents often require a cooperative strategy to be learned. In this work, we propose a constraint‐based multi‐agent reinforcement learning approach called constrained multi‐agent soft actor critic (C‐MSAC) to train control policies for simulated agents performing collaborative multi‐phase tasks. Given a task with phases, the first phases are treated as constraints for the final task phase objective, which is addressed with a centralized training and decentralized execution approach. We highlight our framework on a tray balancing task including two phases: tray lifting and cooperative tray control for target following. We evaluate our proposed approach and compare it against its unconstrained variant (MSAC). The performed comparisons show that C‐MSAC leads to higher success rates, more robust control policies, and better generalization performance.

Publisher

Wiley

Subject

Computer Graphics and Computer-Aided Design,Software

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1002/cav.2182

Reference39 articles.

1. LoweR WuYI TamarA HarbJ Pieter AbbeelO MordatchI.Multi‐agent actor‐critic for mixed cooperative‐competitive environments. Proceedings of the 31st Conference on Neural Information Processing Systems; 2017. p. 30.

2. SchulmanJ WolskiF DhariwalP RadfordA KlimovO.Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 2017.

3. HaarnojaT ZhouA AbbeelP LevineS.Soft actor‐critic: Off‐policy maximum entropy deep reinforcement learning with a stochastic actor. Proceedings of the 35th International Conference on Machine Learning; 2018. p. 1861–70.

4. BrockmanG CheungV PetterssonL SchneiderJ SchulmanJ TangJ et al.Openai gym. arXiv preprint arXiv:1606.01540 2016.

5. OpenAI.Robogym.2020https://github.com/openai/robogym