1. Agrawal S, Goyal N (2012) Analysis of Thompson sampling for the multi-armed bandit problem. Mannor S, Srebro N, Williamson RC, eds. Proc. 25th Annual Conf. Learn. Theory (COLT) (PMLR, Edinburgh, UK), 39.1–39.26.
2. Audibert JY, Bubeck S (2010) Best arm identification in multi-armed bandits. Proc. 23rd Annual Conf. Learn. Theory (COLT), Haifa, Israel.
3. Restless Bandits, Linear Programming Relaxations, and a Primal-Dual Index Heuristic
4. Dynamic Pricing: A Learning Approach