1. [1] R.S. Sutton and A.G. Barto, Introduction to Reinforcement Learning, Decision Theory Models for Applications in Artificial Intelligence: Concepts and Solutions, pp.90-127, 2011.
2. [2] C.H.C.J. Watkins, “Learning from delayed rewards,” Robotics & Autonomous Systems, vol.15, no.4, pp.233-235, 1989.
3. [3] S. Thrun and A. Schwartz, “Issues in using function approximation for reinforcement learning,” Proc. Fourth Connectionist Models Summer School, vol.14, no.3, pp.65-90, 1993.
4. [4] H.V. Hasselt, “Double Q-learning,” Advances in Neural Information Processing Systems 23, Proceedings of A Meeting Held 6-9 Dec. 2010, Conference on Neural Information Processing Systems 2010, Vancouver, British Columbia, Canada, OAI, pp.2613-2621, 2010.
5. [5] V. Mnih, K. Kavukcuoglu, D. Silver, et al., “Playing Atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602v1 [cs.LG], 2013.