1. Sutton R S, Barto A G. Reinforcement Learning: an Introduction. 2nd ed. London: MIT Press, 2018
2. Bertsekas D P, Tsitsiklis J N. Neuro-dynamic programming: an overview. In: Proceedings of the 34th IEEE Conference on Decision and Control. 1995, 560–564
3. Lagoudakis M G, Parr R. Least-squares policy iteration. Journal of Machine Learning Research, 2003, 4: 1107–1149
4. Dann C, Neumann G, Peters J. Policy evaluation with temporal differences: a survey and comparison. The Journal of Machine Learning Research, 2014, 15(1): 809–883
5. Geist M, Scherrer B. Off-policy learning with eligibility traces: a survey. The Journal of Machine Learning Research, 2014, 15(1): 289–333