1. J. Abounadi, D. Bertsekas, V.S. Borkar, Learning algorithms for Markov decision processes with average cost. SIAM J. Control Optim. 40(3), 681–698 (2001)
2. D.P. Bertsekas, Dynamic Programming and Optimal Control, 3rd edn. (Athena Scientific, Belmont, 2007)
3. D.P. Bertsekas, J. Tsitsiklis, Neuro-Dynamic Programming (Athena Scientific, Belmont, 1996)
4. S. Bhatnagar, M.S. Abdulla, Simulation-based optimization algorithms for finite horizon Markov decision processes. Simulation 84(12), 577–600 (2008)
5. D. Blackwell, Discrete dynamic programming. Ann. Math. Stat. 33, 226–235 (1965)