1. Sutton, R.S., Learning to predict by the method of temporal differences, Machine Learning, 3, pp. 9–44, 1988.
2. Sutton, R.S. and Barto, A.G., Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 1998.
3. Yasuharu Koike and Kenji Doya, A Driver Model Based on Reinforcement Learning with Multiple-Step State Estimation, IEICE Transactions, Vol. J84-D-II,No. 2, pp. 370–379.
4. Kazuyuki Samejima, Ken’ichi Katagiri, Kenji Doya and Mituo Kawato, Multiple Model-based Reinforcement Learning of Nonlinear Control, IEICE Transactions, Vol. J83-DII,No. 9, pp. 2092–2106.
5. Christian Balkenius and Jan Moren, Dynamics of a Classical Conditioning Model, ICANN 98, Perspectives in Neural Computing, Springer-Verlag, 1999.