1. Ackley, D. H. k Littman, M. L. (1989), Generalization and scaling in reinforcement learning, in Advances in Neural Information Processing 2, Morgan Kaufmann, San Mateo, CA
2. Anderson, C. W. (1986), Learning and Problem Solving with Multilayer Connectionist Systems, PhD thesis, University of Massachusetts, Amherst, MA Barto, A. G., Bradtke, S. J. k Singh, S. P. (1993), Learning to act using real-time dynamic programming, Technical Report 93–02, Department of Computer and Information Science, University of Massachusetts, Amherst, MA
3. Bellman, R. (1957), Dynamic Programming, Princeton University Press, Princeton, NJ
4. Berry, D. A. k Fristedt, B. (1985), Bandit Problems: equential Allocation of Experiments, Chapman and Hall, London, UK
5. Bertsekas, D. P. (1987), Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall