1. Azar, M. G., Munos, R., Ghavamzadeh, M., & Kappen, H. J. (2011a). Reinforcement learning with a near optimal rate of convergence. Tech. rep. http://hal.inria.fr/inria-00636615 .
2. Azar, M. G., Munos, R., Ghavamzadeh, M., & Kappen, H. J. (2011b). Speedy Q-learning. In Advances in neural information processing systems (Vol. 24, pp. 2411–2419).
3. Azar, M. G., Munos, R., Kappen, H. J. (2012). On the sample complexity of reinforcement learning with a generative model. In ICML. Omnipress.
4. Bartlett, P. L., & Tewari, A. (2009). REGAL: a regularization based algorithm for reinforcement learning in weakly communicating MDPs. In Proceedings of the 25th conference on uncertainty in artificial intelligence (pp. 35–42).
5. Bertsekas, D. P. (2007). Dynamic programming and optimal control (Vol. II, 3rd edn.). Belmount: Athena Scientific.