1. Asynchronous methods for deep reinforcement learning;mnih,2016
2. Trust region policy optimization;schulman,2017
3. Proximal policy optimization algorithms;schulman,2017
4. Scalable trustregion method for deep reinforcement learning using kronecker-factored approximation;wu,2017
5. Sample efficient actor-critic with experience replay;wang,2017