Exploiting environmental signals to enable policy correlation in large-scale decentralized systems-Reference-Cited by-同舟云学术

Exploiting environmental signals to enable policy correlation in large-scale decentralized systems

Published:2022-02-03 Issue:1 Volume:36 Page:
ISSN:1387-2532
Container-title:Autonomous Agents and Multi-Agent Systems
language:en
Short-container-title:Auton Agent Multi-Agent Syst

Author:

Danassis Panayiotis^ORCID,Erden Zeki Doruk,Faltings Boi

Abstract

AbstractCan artificial agents benefit from human conventions? Human societies manage to successfully self-organize and resolve the tragedy of the commons in common-pool resources, in spite of the bleak prediction of non-cooperative game theory. On top of that, real-world problems are inherently large-scale and of low observability. One key concept that facilitates human coordination in such settings is the use of conventions. Inspired by human behavior, we investigate the learning dynamics and emergence of temporal conventions, focusing on common-pool resources. Extra emphasis was given in designing arealistic evaluation setting: (a) environment dynamics are modeled on real-world fisheries, (b) we assume decentralized learning, where agents can observe only their own history, and (c) we run large-scale simulations (up to 64 agents). Uncoupled policies and low observability make cooperation hard to achieve; as the number of agents grow, the probability of taking a correct gradient direction decreases exponentially. By introducing anarbitrary common signal(e.g., date, time, or any periodic set of numbers) as a means to couple the learning process, we show that temporal conventions can emerge and agents reachsustainableharvesting strategies. The introduction of the signal consistently improves the social welfare (by

$$258\%$$

258%on average, up to

$$3306\%$$

3306%), the range of environmental parameters where sustainability can be achieved (by

$$46\%$$

46%on average, up to

$$300\%$$

300%), and the convergence speed in low abundance settings (by

$$13\%$$

13%on average, up to

$$53\%$$

53%).

Funder

EPFL Lausanne

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence

Link

https://link.springer.com/content/pdf/10.1007/s10458-021-09541-7.pdf

Reference53 articles.

1. Aumann, R. J. (1974). Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics, 1(1), 67–96. https://doi.org/10.1016/0304-4068(74)90037-8

2. Bernstein, D.S., Hansen, E.A., & Zilberstein, S. (2005). Bounded policy iteration for decentralized pomdps. In Proceedings of the 19th international joint conference on artificial intelligence, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, IJCAI’05, p. 1287–1292.

3. Borowski, H.P., Marden, J.R., & Shamma, J.S. (2014). Learning efficient correlated equilibria. In Decision and Control (CDC), 2014 IEEE 53rd annual conference on, IEEE, pp. 6836–6841.

4. Budescu, D. V., Au, W. T., & Chen, X. P. (1997). Effects of protocol of play and social orientation on behavior in sequential resource dilemmas. Organizational Behavior and Human Decision Processes, 69(3), 179–193. https://doi.org/10.1006/obhd.1997.2684

5. Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. Transactions on Systems, Man, and Cybernetics: Part C, 38(2), 156–172. https://doi.org/10.1109/TSMCC.2007.913919