1. Minimax policies for adversarial and stochastic bandits;Audibert;COLT,2009
2. Finite-time analysis of the multiarmed bandit problem;Auer;Machine Learning,2002
3. Björnsson Y. and Schiffel S. , Comonparis of GDL reasoners, In Proceedings of the IJCAI-13 Workshop on General Game Playing (GIGA’13), 2013, pp. 55–62.
4. A survey of monte carlo tree search methods;Browne;IEEE Transactions on Computational Intelligence and AI in Games,2012