1. MnihV KavukcuogluK SilverD et al.Playing atari with deep reinforcement learning.arXiv preprint arXiv:1312.5602.2013.
2. Trust region policy optimization;Schulman J;International Conference on Machine Learning PMLR,2015
3. LillicrapTP HuntJJ PritzelA et al.Continuous control with deep reinforcement learning.arXiv preprint arXiv:1509.02971.2015.
4. SchulmanJ WolskiF DhariwalP RadfordA KlimovO.Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347.2017.
5. Human-level control through deep reinforcement learning