1. Q-learning;Watkins;Mach. Learn.,1992
2. R.S. Sutton, A.G. Barto, Reinforcement learning: An introduction (2018).
3. Asynchronous stochastic approximation and q-learning;Tsitsiklis;Mach. Learn.,1994
4. H. Van Hasselt, Estimating the maximum expected value: an analysis of (nested) cross validation and the maximum sample average, arXiv preprint arXiv:1302.7175.
5. S. Thrun, A. Schwartz, Issues in using function approximation for reinforcement learning, in: Proceedings of the Fourth Connectionist Models Summer School, Hillsdale, NJ, 1993, pp. 255–263.