Dismemberment and Design for Controlling the Replication Variance of Regret for the Multi-Armed Bandit-Reference-Cited by-同舟云学术

Dismemberment and Design for Controlling the Replication Variance of Regret for the Multi-Armed Bandit

Published:2022-08-01 Issue:4 Volume:16 Page:
ISSN:1559-8608
Container-title:Journal of Statistical Theory and Practice
language:en
Short-container-title:J Stat Theory Pract

Author:

Keaton Timothy J.,Sabbaghi Arman^ORCID

Publisher

Springer Science and Business Media LLC

Subject

Statistics and Probability

Link

https://link.springer.com/content/pdf/10.1007/s42519-022-00280-w.pdf

Reference35 articles.

1. Agrawal S, Goyal N (2012) Analysis of Thompson sampling for the multi-armed bandit problem. In: Mannor S, Srebro N, Williamson RC (eds) Proceedings of the 25th annual conference on learning theory, proceedings of machine learning research, vol 23. PMLR, Edinburgh, Scotland, pp 39.1–39.26. http://proceedings.mlr.press/v23/agrawal12.html

2. Agrawal S, Goyal N (2013) Further optimal regret bounds for Thompson sampling. In: Sixteenth international conference on artificial intelligence and statistics (AISTATS). https://www.microsoft.com/en-us/research/publication/further-optimal-regret-bounds-for-thompson-sampling/

3. Allenberg C, Auer P, Györfi L, Ottucsák G (2006) Hannan consistency in on-line learning in case of unbounded losses under partial monitoring. In: Algorithmic learning theory: 17th international conference, ALT 2006, Barcelona, Spain, October 7–10, 2006, proceedings, lecture notes in computer science, pp 229–243. Springer, Berlin. https://books.google.com/books?id=lsmpCAAAQBAJ

4. Audibert JY, Munos R, Szepesvári C (2009) Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theor Comput Sci 410(19):1876–1902

5. Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47(2–3):235–256