1. C. Watkins, Learning from delayed rewards, Ph.D. Thesis, University of Cambridge, 1989.
2. Q-learning;Watkins;Machine Learning,1992
3. Asynchronous stochastic approximation and Q-learning;Tsitsiklis;Machine Learning,1994
4. On the convergence of stochastic iterative dynamic programming algorithms;Jaakkola;Neural Computation,1994
5. V.S. Borkar, On the number of samples required for Q-learning, in: 38th Allerton Conf. on Communication, Control and Computing, Monticello, Illinois, 2000.