1. Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. 2011. Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems. 2312–2320.
2. Shipra Agrawal and Navin Goyal. 2013. Thompson sampling for contextual bandits with linear payoffs. In International Conference on Machine Learning. 127–135.
3. Using confidence bounds for exploitation-exploration trade-offs;Auer Peter;Journal of Machine Learning Research,2002
4. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem