1. Baird, L. C. (1995). Advantage Learning. To be published as a U.S. Air Force technical report by the Department of Computer Science, U.S. Air Force Academy.
2. Dynamic Programming: Deterministic and Stochastic Models;Bertsekas,1987
3. Bradtke, S. J (1993). Reinforcement learning applied to linear quadratic regulation. Proceedings of the Fifth Conference on Neural Information Processing Systems (pp. 295–302). Morgan Kaufmann.
4. Learning representations by back-propagating errors;Rumelhart;Nature,1986
5. Learning to predict by the methods of temporal differences;Sutton;Machine Learning,1988