1. Improved algorithms for linear stochastic bandits;Abbasi-Yadkori,2011
2. Further optimal regret bounds for thompson sampling;Agrawal,2013
3. Best arm identification in multi-armed bandits;Audibert,2010
4. Finite-time analysis of the multiarmed bandit problem;Auer;Mach. Learn.,2002
5. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem;Auer;Period. Math. Hungar.,2010