1. COINS technical report;A.G. Barto,1991
2. Barto, A.G. & Singh, S.P. (1990). On the computational economics of reinforcement learning. In D.S. Touretzky, J. Elman, T.J. Sejnowski & G.E. Hinton, (Eds.),Proceedings of the 1990 Connectionist Models Summer School. San Mateo, CA: Morgan Kaufmann.
3. Bellman, R.E. & Dreyfus, S.E. (1962).Applied dynamic programming. RAND Corporation.
4. Chapman, D. & Kaelbling, L.P. (1991). Input generalization in delayed reinforcement learning: An algorithm and performance comparisons.Proceedings of the 1991 International Joint Conference on Artificial Intelligence (pp. 726?731).
5. Kushner, H. & Clark, D. (1978).Stochastic approximation methods for constrained and unconstrained systems. Berlin, Germany: Springer-Verlag.