1. Aberdeen DA (2003) Policy-gradient algorithms for partially observable Markov decision processes. PhD thesis, Australian National University, April 2003
2. Amari S (1998) Natural gradient works efficiently in learning. Neural Comput 10:251–276
3. Anderson C (2000) Approximating a policy can be easier than approximating a value function. Computer science technical report, University of Colorado State
4. Bagnell JA, Schneider JG (2001) Autonomous helicopter control using reinforcement learning policy search methods. In: IEEE international conference on robotics and automation, ICRA. Korea
5. Baird K (1995) Residual algorithms: reinforcement learning with function approximation. In: 12th international conference on machine learning, ICML. San Francisco, USA