1. Audibert JY, Bubeck S, Munos R (2010) Best arm identification in multi-armed bandits, COLT 2010. In: The 23rd conference on learning theory, Haifa, Israel
2. Barlow R, Proschan F (1975) Statistical theory of probability and life testing: probability models. Holt-Rinehart-Winston
3. Bechhofer RE, Kiefer J, Sobel M (1968) Sequential identification and ranking procedures. The University of Chicago Press, Chicago
4. Bechhofer RE, Kulkarni RV (1982) Closed adaptive sequential procedures for selecting the best of $$k > 2$$ Bernoulli populations. In: Gupta SS, Berger JO (eds) Statistical decision theory and related topics III, vol 1. Academic Press, New York, pp 61–108
5. Even-Dar E, Mannor S, Mansour Y (2006) Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. J Mach Learn Res 7:1079–1105