1. Mnih, V. et al. Nature 518, 529–533 (2015).
2. Sutton R. S. & Barto A. G. Reinforcement Learning: An Introduction (MIT Press, 1998).
3. Watkins, C. J. C. H. Learning from Delayed Rewards. PhD thesis, Univ. Cambridge (1989).
4. Guo, X., Singh, S., Lee, H., Lewis, R. L. & Wang, X. Adv. Neural Inf. Process. Syst. 27 (2014).
5. Bareinboim, E. & Pearl, J. in Proc. 25th AAAI Conf. on Artificial Intelligence 100–108 (2011).