1. S. Agrawal, N. Goyal, Analysis of Thompson sampling for the multi-armed bandit problem, in: Proceedings of the 25th Annual Conference on Learning Theory, 2012, pp. 39.1–39.26.
2. S. Agrawal, N. Goyal, Further optimal regret bounds for Thompson sampling, in: Proceedings of the 16th International Conference on Artificial Intelligence and Statistics, 2013, pp. 99–107.
3. Adaptive Control;Åström,1994
4. J.-Y. Audibert, S. Bubeck, Minimax policies for adversarial and stochastic bandits, in: Conference on Learning Theory (COLT), 2009.
5. Finite-time analysis of multiarmed bandit problem;Auer;Mach. Learn.,2002