1. Learning to optimize via information-directed sampling;russo;ADVANCES IN NEURAL IN-FORMATION PROCESSING SYSTEMS,2014
2. Optimism in reinforcement learning and Kullback-Leibler divergence
3. Thompson sampling for linear-quadratic control problems;abeille;Artificial Intelli- gence and Statistics,2017
4. Thompson sampling for learning parameterized markov decision processes;gopalan;Conference on Learning Theory,0
5. On bayesian upper confidence bounds for bandit problems;kaufmann;Artificial Intelligence and Statistics,2012