1. Trust region policy optimization;schulman;International Conference on Machine Learning,2015
2. Playing atari with deep reinforcement learning;mnih,2013
3. Simple random search provides a competitive approach to reinforcement learning;mania,2018
4. Highdi-mensional continuous control using generalized advantage estimation;schulman;4th International Conference on Learning Representations ICLR,2016