1. Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. In: Advances in neural information processing systems 24: 25th annual conference on neural information processing systems 2011. Proceedings of a meeting held 12–14 Dec 2011, Granada, Spain, pp 2312–2320
2. Agrawal S, Goyal N (2012a) Analysis of Thompson sampling for the multi-armed bandit problem. In: COLT 2012—the 25th annual conference on learning theory, 25–27 June 2012, Edinburgh, Scotland, pp 39.1–39.26
3. Agrawal S, Goyal N (2012b) Further optimal regret bounds for Thompson sampling. In: Proceedings of the sixteenth international conference on artificial intelligence and statistics, PMLR 31, 2013, pp 99–107
4. Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. In: Proceedings of the 30th international conference on machine learning, ICML 2013, Atlanta, GA, USA, 16–21 June 2013, pp 127–135
5. Agarwal A, Hsu D, Kale S, Langford J, Li L, Schapire RE (2014) Taming the monster: a fast and simple algorithm for contextual bandits. In: Proceedings of the 31th international conference on machine learning (ICML), 2014, pp 1638–1646