1. Dueling network architectures for deep reinforcement learning;Z. Wang
2. Asynchronous methods for deep reinforcement learning;V. Mnih
3. Reinforcement learning through asynchronous advantage actor-critic on a GPU;M. Babaeizadeh
4. Trust region policy optimization;J. Schulman
5. Proximal policy optimization algorithms;J. Schulman,2017