1. Agrawal, S., Goyal, N., 2012. Analysis of Thompson sampling for the multi-armed bandit problem. In: Proceedings Of The 25th Annual Conference On Learning Theory.
2. Finite-time analysis of the multiarmed bandit problem;Auer;Mach. Learn.,2002
3. Agent57: Outperforming the atari human benchmark;Badia,2020
4. Never give up: Learning directed exploration strategies;Badia,2020
5. Recent advances in our understanding of risk-sensitive foraging preferences;Bateson;Proc. Nutrition Soc,2002