1. R. Bellman, Dynamic programming. Princeton, N.J.: Princeton University Press, 1957.
2. P. Dyer and S. R. McReynolds, The computation and theory of optimal control. New York: Academic Press, 1970.
3. G. Tesauro, “Temporal difference learning of backgammon strategy,” in Proceedings of the Ninth International Workshop Machine, D. Sleeman and P. Edwards, Eds. San Mateo, CA: Morgan Kaufmann, 1992, pp. 9–18.
4. D._P. Bertsekas and J. N. Tsitsiklis, Neuro-dynamic Programming. Bellmont, MA: Athena Scientific, 1996.
5. R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. Cambridge: MIT Press, 1998.