Author:
Garivier Aurélien,Moulines Eric
Publisher
Springer Berlin Heidelberg
Reference25 articles.
1. Agrawal, R.: Sample mean based index policies with O(logn) regret for the multi-armed bandit problem. Adv. in Appl. Probab. 27(4), 1054–1078 (1995)
2. Lecture Notes in Artificial Intelligence;J.Y. Audibert,2007
3. Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48–77 (2002)
4. Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3(Spec. Issue Comput. Learn. Theory), 397–422 (2002)
5. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2/3), 235–256 (2002)
Cited by
255 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献