1. Deep reinforcement learning at the edge of the statistical precipice;Agarwal,2021
2. Thinking fast and slow with deep learning and tree search;Anthony,2017
3. Move prediction in Go with the maximum entropy method;Araki,2007
4. Finite-time analysis of the multiarmed bandit problem;Auer;Mach. Learn.,2002
5. The construction of economical and correct algorithms for king and pawn against king;Beal,1980