1. Reinforcement Learning: An Introduction
2. Pol-icy gradient methods for reinforcement learning with function approximation;Sutton;Advances in Neural Information Processing Systems,1999
3. A natural policy gradient;Kakade;Advances in Neural Information Processing Systems,2001
4. Trust region policy optimization;Schulman,2015