Publisher
Springer Berlin Heidelberg
Reference59 articles.
1. Agrawal, R.: The continuum-armed bandit problem. SIAM Journal on Control and Optimization 33(6), 1926–1951 (1995)
2. Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learning Research 3, 397–422 (2002)
3. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite time analysis of the multiarmed bandit problem. Machine Learning 47(2/3), 235–256 (2002)
4. Auer, P., Jaksch, T., Ortner, R.: Near-optimal regret bounds for reinforcement learning. In: Proceedings of NIPS 2008 (2008)
5. Lecture Notes in Artificial Intelligence;P. Auer,2007