1. Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (1995). Gambling in a rigged casino: The adversarial multi-arm bandit problem. In Proceedings of the thirtysixth annual symposium on foundations of computer science (pp. 322–331). Milwaukee, WI: IEEE Computer Society Press.
2. Banerjee, B., & Peng, J. (2004). Performance bounded reinforcement learning in strategic interactions. In Proceedings of the nineteenth national conference on artificial intelligence (AAAI-04) (pp. 2–7). San Jose, CA: AAAI Press.
3. Bowling, M. (2005). Convergence and no-regret in multiagent learning. In Proceedings of NIPS 2004/5.
4. Bowling, M., & Veloso, M. (2001). Rational and convergent learning in stochastic games. In Proceedings of the seventeenth international joint conference on artificial intelligence (pp. 1021–1026). Seattle, WA.
5. Bowling M., Veloso M. (2002). Multiagent learning using a variable learning rate. Artificial Intelligence 136: 215–250