1. Agrawal, R.: Sample mean based index policies with O(logn) regret for the multi-armed bandit problem. Advances in Applied Probability 27, 1054–1078 (1995)
2. Audibert, J.-Y., Munos, R., Szepesvári,Cs.: Variance estimates and exploration function in multi-armed bandit. Research report 07-31, Certis - Ecole des Ponts (2007),
http://cermics.enpc.fr/~audibert/RR0731.pdf
3. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite time analysis of the multiarmed bandit problem. Machine Learning 47(2-3), 235–256 (2002)
4. Auer, P., Cesa-Bianchi, N., Shawe-Taylor, J.: Exploration versus exploitation challenge. In: 2nd PASCAL Challenges Workshop. Pascal Network (2006)
5. Gittins, J.C.: Multi-armed Bandit Allocation Indices. In: Wiley-Interscience series in systems and optimization. Wiley, Chichester (1989)