1. Using confidence bounds for exploitation–exploration trade-offs;Auer;J. Mach. Learn. Res.,2002
2. P. Auer, R. Ortner, Online regret bounds for a new reinforcement learning algorithm, in: First Austrian Cognitive Vision Workshop, 2005, pp. 35–42
3. R-MAX—A general polynomial time algorithm for near-optimal reinforcement learning;Brafman;J. Mach. Learn. Res.,2002
4. E. Even-Dar, S. Mannor, Y. Mansour, PAC bounds for multi-armed bandit and Markov decision processes, in: 15th Annual Conference on Computational Learning Theory (COLT), 2002, pp. 255–270
5. E. Even-Dar, S. Mannor, Y. Mansour, Action elimination and stopping conditions for reinforcement learning, in: The Twentieth International Conference on Machine Learning (ICML 2003), 2003, pp. 162–169