1. Schrittwieser, J., et al.: Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588(7839), 604–609 (2020)
2. Silver, D., et al.: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362(6419), 1140–1144 (2018)
3. Danihelka, I., Guez, A., Schrittwieser, J., Silver, D.: October. Policy improvement by planning with Gumbel. In: International Conference on Learning Representations (2021)
4. Kool, W., Van Hoof, H., Welling, M.: Stochastic beams and where to find them: The Gumbel-top-k trick for sampling sequences without replacement. In: International Conference on Machine Learning, pp. 3499–3508. PMLR (2019)
5. Karnin, Z., Koren, T., Somekh, O.: Almost optimal exploration in multi-armed bandits. In: International Conference on Machine Learning, pp. 1238–1246. PMLR (2013)