1. J.A. Bagnell, J.G. Schneider, Covariant policy search, in: IJCAI, 2003, pp. 1019–1024.
2. Infinite-horizon policy-gradient estimation;Baxter;Journal of Artificial Intelligence Research (JAIR),2001
3. Dynamic Programming;Bellman,1957
4. D.P. Bertsekas, J.N. Tsitsiklis, Neuro-dynamic Programming, Athena Scientific, Belmont, Mass, 1996.
5. Natural actor-critic algorithms;Bhatnagar;Automatica,2009