1. Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, Bradford Book. [2nd ed.].
2. Using Confidence Bounds for Exploitation-Exploration Trade-offs;Auer;J. Mach. Learn. Res.,2002
3. The non-stochastic multi-armed bandit problem;Auer;SIAM J. Comput.,2002
4. Garivier, A., and Cappe, O. (2011, January 24). The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond. Proceedings of the 24th Annual Conference on Learning Theory, Budapest, Hungary.
5. On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples;Thompson;Biometrika,1933