1. Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)
2. Lin, L.: Reinforcement Learning for Robots Using Neural Networks, PhD thesis, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA (1993)
3. Rummery, G., Niranjan, M.: On-line Q-Learning Using Connectionist Systems. Cambridge University Engineering Department, Cambridge (1994)
4. Crites, R.H., Barto, A.G.: Improving Elevator Performance Using Reinforcement Learning. In: NIPS-8 (1996)
5. Tesauro, G.J.: Temporal difference learning and TD-Gammon. Communications of the ACM 38(3), 58–68 (1995)