1. Altman, E.: Constrained Markov Decision Processes. Chapman and Hall/CRC Press, London (1999)
2. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
3. Sutton, R.S., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
4. Konda, V.R., Tsitsiklis, J.N.: On actor–critic algorithms. SIAM J. Control Optim. 42(4), 1143–1166 (2003)
5. Bhatnagar, S., Sutton, R.S., Ghavamzadeh, M., Lee, M.: Natural actor–critic algorithms. Automatica 45, 2471–2482 (2009)