Abstract
The connection between optimal stopping times of American Options and multi-armed bandits is the subject of active research. This article investigates the effects of optional stopping in a particular class of multi-armed bandit experiments, which randomly allocates observations to arms proportional to the Bayesian posterior probability that each arm is optimal (Thompson sampling). The interplay between optional stopping and prior mismatch is examined. We propose a novel partitioning of regret into peri/post testing. We further show a strong dependence of the parameters of interest on the assumed prior probability density.
Reference37 articles.
1. Analysis of thompson Sampling for the Multi-Armed Bandit Problem;Agrawal,2012
2. Finite-time Analysis of the Multiarmed Bandit Problem;Auer;Machine Learn.,2002
3. Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence;Aziz,2018
4. American Options, Multi-Armed Bandits, and Optimal Consumption Plans: A Unifying View;Bank,2003
5. Prior-free and Prior-dependent Regret Bounds for thompson Sampling;Bubeck,2014
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Latent go-explore with area as unit;Information Processing & Management;2024-03