1. P. Auer, N. Cesa-Bianchi, Y. Freund, R.E. Schapire, The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48–77 (2002)
2. H. Gu, X. Guo, X. Wei, R. Xu, Mean-field multi-agent reinforcement learning: a decentralized network approach. arXiv:2018.02731
3. R. Gummadi, R. Johari, S. Schmit, J. Yu, Mean field analysis of multi-armed bandit games. SSRN (2016)
4. E. Hazan, Introduction to Online Convex Optimization (MIT Press, 2021)
5. P. Hu, Y. Chen, L. Huang, Nearly minimax optimal reinforcement learning with linear function approximation, in The Thirty-ninth International Conference on Machine Learning (2022)