1. Agarwal, A., Kakade, S.M., Lee, J.D., and Mahajan, G. (2020). On the theory of policy gradient methods: Optimality, approximation, and distribution shift.
2. Decentralized q-learning for stochastic teams and games;Arslan;IEEE Transactions on Automatic Control,2016
3. Bertrand, N., Markey, N., Sadhukhan, S., and Sankur, O. (2020). Dynamic network congestion games. arXiv preprint arXiv:2009.13632.
4. Rational and convergent learning in stochastic games;Bowling,2001
5. Multi-agent reinforcement learning: An overview;Buşoniu;Innovations in multi-agent systems and applications-1,2010