1. “Learning and Sequential Decision Making”;Barto,1990
2. A. Barto, S. Bradtke, & S. Singh. (1991) “Real-Time Learning and Control Using Asynchronous Dynamic Programming.” Computer Science Department, University of Massachusetts, Tech. Rept. 91–57.
3. “Monte-Carlo Matrix Inversion and Reinforcement Learning”;Barto;Neural Information Processing Systems,1994
4. “A Problem in the Sequential Design of Experiments,”;Bellman;Sankhya,1956
5. Dynamic Programming: Deterministic and Stochastic Models;Bertsekas,1987