1. Exploration-Exploitation with Thompson Sampling in Linear Systems;Abeille,2017
2. Reinforcement learning: Theory and algorithms;Agarwal,2020
3. The continuum-armed bandit problem;Agrawal;SIAM J. Control Optim.,1995
4. Agrawal, S., Goyal, N., 2012a. Analysis of Thompson Sampling for the Multi-armed Bandit Problem. In: Proceedings of the 25th Annual Conference on Learning Theory. pp. 39.1–39.26.
5. Thompson sampling for contextual bandits with linear payoffs;Agrawal,2012