1. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv.
2. Q-learning;Watkins;Mach. Learn.,1992
3. Schulman, J., Levine, S., Abbeel, P., Jordan, M., and Moritz, P. (2015, January 6–11). Trust region policy optimization. Proceedings of the International Conference on Machine Learning, Lille, France.
4. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv.
5. Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2016, January 2–4). Continuous control with deep reinforcement learning. Proceedings of the 4th International Conference on Learning Representations, ICLR 2016—Conference Track Proceedings, San Juan, Puerto Rico. Available online: http://xxx.lanl.gov/abs/1509.02971.