1. Using confidence bounds for exploitation-exploration trade-offs;Auer Peter;Journal of Machine Learning Research 3,2002
2. Peter Auer Nicolo Cesa-Bianchi and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47 2-3 (2002) 235--256. 10.1023/A:1013689704352 Peter Auer Nicolo Cesa-Bianchi and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47 2-3 (2002) 235--256. 10.1023/A:1013689704352
3. Miroslav Dudik Daniel Hsu Satyen Kale Nikos Karampatziakis John Langford Lev Reyzin and Tong Zhang. 2011. Efficient optimal learning for contextual bandits. arXiv preprint arXiv: 1106.2369 (2011). Miroslav Dudik Daniel Hsu Satyen Kale Nikos Karampatziakis John Langford Lev Reyzin and Tong Zhang. 2011. Efficient optimal learning for contextual bandits. arXiv preprint arXiv: 1106.2369 (2011).
4. John Langford and Tong Zhang. 2008. The epoch-greedy algorithm for multi-armed bandits with side information. In Advances in neural information processing systems. 817--824. John Langford and Tong Zhang. 2008. The epoch-greedy algorithm for multi-armed bandits with side information. In Advances in neural information processing systems. 817--824.
5. Learning diverse rankings with multi-armed bandits