1. Watkins, C.: Learning from delayed rewards. Ph.D. thesis, University of Cambridge, England (1989)
2. Adaptive Computation and Machine Learning;RS Sutton,1998
3. Wiering, M., Schmidhuber, J.: Fast online q(lambda). Mach. Learn. 33(1), 105–115 (1998)
4. Ribeiro, C., Szepesvári, C.: Q-learning combined with spreading: convergence and results. In: ISRF-IEE International Conference on Intelligent and Cognitive Systems (Neural Networks Symposium), pp. 32–36 (1996)
5. Ribeiro, C., Pegoraro, R., Costa, A.: Experience generalization for concurrent reinforcement learners: the minimax-qs algorithm. In: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 1239–1245. ACM, NY (2002)