1. Abbasi-Yadkori Y. Pál D. and Szepesvári C. (2011) Improved algorithms for linear stochastic bandits in ‘Advances in Neural Information Processing Systems’ pp. 2312–2320.
2. Agrawal S. and Goyal N. (2013) Thompson sampling for contextual bandits with linear payoffs in ‘International Conference on Machine Learning’ PMLR pp. 127–135.
3. Athey S. (2019) 21. the impact of machine learning on economics in ‘The economics of artificial intelligence’ University of Chicago Press pp. 507–552.
4. Auer P. (2002) ‘Using confidence bounds for exploitation-exploration trade-offs’ Journal of Machine Learning Research 3 (Nov) 397–422.
5. Bibaut A. Dimakopoulou M. Kallus N. Chambaz A. and van der Laan M. (2021) ‘Post-contextual-bandit inference’ Advances in Neural Information Processing Systems 34 .