1. V. Mnih, K. Kavukcuoglu, D. Silver et al., Playing atari with deep reinforcement learning, (2013). [2019–11–10] https://arxiv.org/pdf/1312.5602.pdf
2. T.P.Lillicrap, J.J. Hunt, A. Pritzel et al., Continuous control with deep reinforcement learning, (2015). [2019–11–10] https://arxiv.org/pdf/1509.02971.pdf
3. R.S. Sutton, D.A. McAllester, S.P.Singh et al., Policy gradient methods for reinforcement learning with function approximation. in Advances in Neural Information Processing Systems (2000), pp. 1057–1063
4. D. Silver, G. Lever, N. Heess et al., Deterministic policy gradient algorithms, (2014). [2019–11–10] http://xueshu.baidu.com/usercenter/paper/show?paperid=43a8642b81092513eb6bad1f3f5231e2&site=xueshu_se
5. V. Mnih, A.P. Badia, M. Mirza et al., Asynchronous methods for deep reinforcement learning. in International Conference on Machine Learning (2016), pp. 1928–1937