1. Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multi-armed bandit problem. Machine Learning, 47(2–3), 235–256.
2. Balla, R., & Fern, A. (2009). UCT for tactical assault planning in real-time strategy games. In 21st international joint conference on artificial intelligence.
3. Baxter, J., Tridgell, A., & Weaver, L. (2000). Learning to play chess using temporal differences. Machine Learning, 40(3), 243–263.
4. Buro, M. (1999). From simple features to sophisticated evaluation functions. In 1st international conference on computers and games (pp. 126–145).
5. Chaslot, G., Chatriot, L., Fiter, C., Gelly, S., Hoock, J., Perez, J., Rimmel, A., & Teytaud, O. (2008). Combining expert, online, transient and online knowledge in Monte-Carlo exploration. In 8th European workshop on reinforcement learning.