1. Aberdeen, D. (2006). POMDPs and policy gradients, presentation at the Machine Learning Summer School (MLSS)
2. Stochastic optimization;Aleksandrov;Engineering Cybernetics,1968
3. Natural gradient works efficiently in learning;Amari;Neural Computation,1998
4. Using local trajectory optimizers to speed up global optimization in dynamic programming;Atkeson,1994
5. Bagnell, J., & Schneider, J. (2003). Covariant policy search. In Proceedings of the international joint conference on artificial intelligence (pp. 1019–1024)