1. Agrawal S, Goyal N (2012) Analysis of thompson sampling for the multi-armed bandit problem. In: Conference on learning theory, pp 39.1–39.26
2. Agrawal S, Goyal N (2013) Further optimal regret bounds for Thompson sampling. In: Artificial intelligence and statistics, pp 99–107
3. Anand A, Mausam GA, Singla P (2015) ASAP-UCT: Abstraction of state-action pairs in UCT. In: Yang Q, Wooldridge M (eds) IJCAI. AAAI Press, pp 1509–1515
4. Anand A, Mausam RN, Singla P (2016) OGA-UCT: On-the-go abstractions in UCT. In: Coles AJ, Coles A, Edelkamp S, Magazzeni D, Sanner S (eds) ICAPS. AAAI Press, pp 29– 37
5. Asmuth J, Littman ML (2011) Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search. In: Uncertainty in artificial intelligence, pp 19–26