1. D. Aberdeen, Policy-gradient algorithms for partially observable Markov decision processes, Ph.D. Thesis, Australian National Unversity, 2003.
2. POMDPs and policy gradients;Aberdeen,2006
3. Natural gradient works efficiently in learning;Amari;Neural Comput.,1998
4. Covariant policy search;Bagnell,2003
5. L.C. Baird, Advantage updating, Technical Report WL-TR-93-1146, Wright Lab., 1993.