1. Abernethy, J., Lee, C., Sinha, A. & Tewari, A. (2014). Online linear optimization via smoothing. Conference on learning theory, Vol. 35 (pp. 807–823).
2. Agrawal, S., & Goyal, N. (2013). Further optimal regret bounds for thompson sampling. Artificial intelligence and statistics, Vol. 31 (pp. 99–107).
3. Anandkumar, A., Michael, N., Tang, A. K., & Swami, A. (2011). Distributed algorithms for learning and cognitive medium access with logarithmic regret. IEEE Journal on Selected Areas in Communications., 29(4), 731–745.
4. Bistritz, I., & Leshem, A. (2018). Distributed multi-player bandits-a game of thrones approach. Advances in Neural Information Processing Systems, 31 (pp. 7222–723).
5. Boucheron, S., Lugosi, G., & Massart, P. (2016). Concentration inequalities: A nonasymptotic theory of independence. Oxford University Press.