Author:
Agarwal Mridul,Aggarwal Vaneet,Ghosh Arnob,Tiwari Nilay
Abstract
Stochastic games provide a framework for interactions among multiple agents and enable a myriad of applications. In these games, agents decide on actions simultaneously. After taking an action, the state of every agent updates to the next state, and each agent receives a reward. However, finding an equilibrium (if exists) in this game is often difficult when the number of agents becomes large. This paper focuses on finding a mean-field equilibrium (MFE) in an action-coupled stochastic game setting in an episodic framework. It is assumed that an agent can approximate the impact of the other agents’ by the empirical distribution of the mean of the actions. All agents know the action distribution and employ lower-myopic best response dynamics to choose the optimal oblivious strategy. This paper proposes a posterior sampling-based approach for reinforcement learning in the mean-field game, where each agent samples a transition probability from the previous transitions. We show that the policy and action distributions converge to the optimal oblivious strategy and the limiting distribution, respectively, which constitute an MFE.
Funder
National Science Foundation
Subject
Computational Mathematics,Computational Theory and Mathematics,Numerical Analysis,Theoretical Computer Science
Reference26 articles.
1. Cooperative Multi-Agent Learning: The State of the Art
2. Nash Q-learning for general-sum stochastic games;Hu;J. Mach. Learn. Res.,2003
3. Mean Field Multi-Agent Reinforcement Learning;Yang;arXiv,2018
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献