1. Abounadi, J., Bertsekas, D., Borkar, V.S.:Learning algorithms for Markov decision processes with average cost. SIAM J. Control Optim. 40(3), 681–698 (electronic) (2001)
2. Azar, M.G., Munos, R., Ghavamzadeh, M., Kappen, H.: Speedy Q-learning. In: Advances in Neural Information Processing Systems (2011)
3. Benveniste, A., Métivier, M., Priouret, P.: Adaptive Algorithms and Stochastic Approximations. Springer, Berlin (2012)
4. Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 2, 4th edn. Athena Scientific (2012)
5. Bhandari, J., Russo, D., Singal, R.: A finite time analysis of temporal difference learning with linear function approximation (2018). arXiv preprint arXiv:1806.02450