1. Abdallah, S., & Lesser, V.R. (2006). Learning the task allocation game. In Proc. of AAMAS ’06 (pp. 850–857). ACM.
2. Agrawal, R. (1995). Sample mean based index policies with O(log n) regret for the multi-armed bandit problem. Advances in Applied Probability pp. 1054–1078.
3. Agrawal, S., & Goyal, N. (2012). Analysis of thompson sampling for the multi-armed bandit problem. In COLT (pp. 39–1).
4. Applegate, D.L., Bixby, R.E., Chvatal, V., Cook, W.J. (2011). The traveling salesman problem: a computational study. Princeton: Princeton University Press.
5. Audibert, J.Y., & Bubeck, S. (2010). Regret bounds and minimax policies under partial monitoring. Journal of Machine Learning Research, 11(Oct), 2785–2836.