1. Optimal control of Markov processes with incomplete state information;Aström;Journal of Mathematical Analysis and Applications,1965
2. Infinite-Horizon Policy-Gradient Estimation;Baxter;Journal of Artificial Intelligence Research,2001
3. Bellman, R.E. (1957). Dynamic Programming. Princeton University Press, Princeton, NJ, USA, 1 edition.
4. Bertsekas, D.P. (2001). Dynamic Programming and Op timal Control (Volume II). Athena Scientific, Belmont, Massachusetts, 2nd edition.
5. Bertsekas, D.P. and Tsitsiklis, J.N. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, Massachusetts.