1. (1) R. S. Sutton and A. G. Barto: Reinforcement Learning, MIT Press, Cambridge (1998)
2. (2) R. J. Williams: “Simple Statistical Gradient-following Algorithms for Connectionist Reinforcement Learning”, Machine Learning, Vol. 8, pp. 229-256 (1992)
3. (3) H. Kimura, M. Yamamura, and S. Kobayashi: “Reinforcement Learning in Partially Observable Markov Decision Processes: A Stochastic Gradient Method”, Journal of the Japanese Society for Artificial Intelligence, Vol. 11, No. 5, pp. 761-768 (1996) (in Japanese)
4. (4) L. C. Baird and A. W. Moore: “Gradient Descent for General Reinforcement Learning”, Advances in Neural Information Processing Systems 11, MIT Press, pp. 968-974 (1999)
5. (5) R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour: “Policy Gradient Methods for Reinforcement Learning with Function Approximation”, Advances in Neural Information Processing Systems 12, MIT Press, pp. 1057-1063 (2000)