1. Aberdeen, D. (2003). Policy-gradient algorithms for partially observable markov decision processes. Ph.D. thesis, Australian National University
2. Reinforcement learning in POMDPs via direct gradient ascent;Baxter,2000
3. Biped dynamic walking using reinforcement learning;Benbrahim;Robotics and Autonomous Systems,1997
4. Buss, M., & Hirche, S. (2008). Institute of Automatic Control Engineering, TU München, Germany. http://www.lsr.ei.tum.de/
5. Completely derandomized self-adaptation in evolution strategies;Hansen;Evolutionary Computation,2001