1. Human-level control through deep reinforcement learning
2. V. Mnih et al . Proc. Int. Conf. Mach. Learn. 48 pp. 1928–1937 (2016).
3. J. Schulman F. Wolski P. Dhariwal A. Radford O. Klimov Proximal policy optimization algorithms. arXiv:1707.06347 [cs.LG] (2017).
4. T. P. Lillicrap et al . Continuous control with deep reinforcement learning. Proc. Int. Conf. Learn. Rep . (2016).
5. M. Jaderberg et al . Reinforcement learning with unsupervised auxiliary tasks. Proc. Int. Conf. Learn. Rep . (2017).