Multi-agent Gradient-Based Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning-Reference-Cited by-同舟云学术

Multi-agent Gradient-Based Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

Published:2024-06-24 Issue:1 Volume:17 Page:
ISSN:1875-6883
Container-title:International Journal of Computational Intelligence Systems
language:en
Short-container-title:Int J Comput Intell Syst

Author:

Ren Jineng^ORCID

Abstract

AbstractThis paper proposes a gradient-based multi-agent actor-critic algorithm for off-policy reinforcement learning using importance sampling. Our algorithm is incremental with full gradients, and its complexity per iteration scales linearly with the size of approximation features. Previous multi-agent actor-critic algorithms are limited to the on-policy setting or off-policy emphatic temporal difference (TD) learning and they do not take advantage of the advances in off-policy gradient temporal difference learning (GTD). As a theoretical contribution, we establish that the critic step of the proposed algorithm converges to the TD solution of the projected Bellman equation and the actor step converges to the set of asymptotically stable fixed points. Numerical experiments on the multi-agent generalization of the Boyan’s chain problem show that the proposed approach provides improved performances in terms of stability and convergence rate as compared with the state-of-the-art baseline algorithm.

Funder

Wenzhou University

McGill University

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s44196-024-00560-2.pdf

Reference48 articles.

1. Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using deep reinforcement learning. In: Proc. Int. Conf. Autonomous Agents and Multiagent Systems, pp. 66–83 (2017)

2. Sallab, A.E., Abdou, M., Perot, E., Yogamani, S.: Deep reinforcement learning framework for autonomous driving. Electron. Imaging 2017(19), 70–76 (2017)

3. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)

4. Zhang, K., Yang, Z., Basar, T.: Networked multi-agent reinforcement learning in continuous spaces. In: Proc. IEEE Conf. Decision and Control, pp. 2771–2776 (2018)

5. Zhang, K., Yang, Z., Liu, H., Zhang, T., Basar, T.: Fully decentralized multi-agent reinforcement learning with networked agents. In: Proc. Int. Conf. Machine Learning, pp. 5872–5881 (2018)