Least-Squares Methods for Policy Iteration-Reference-Cited by-同舟云学术

Least-Squares Methods for Policy Iteration

Published:2012 Issue: Volume: Page:75-109
ISSN:1867-4534
Container-title:Adaptation, Learning, and Optimization
language:
Short-container-title:

Author:

Buşoniu Lucian,Lazaric Alessandro,Ghavamzadeh Mohammad,Munos Rémi,Babuška Robert,De Schutter Bart

Publisher

Springer Berlin Heidelberg

Link

http://link.springer.com/content/pdf/10.1007/978-3-642-27645-3_3

Reference57 articles.

1. Antos, A., Szepesvári, C., Munos, R.: Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning 71(1), 89–129 (2008)

2. Baird, L.: Residual algorithms: Reinforcement learning with function approximation. In: Proceedings 12th International Conference on Machine Learning (ICML-1995), Tahoe City, U.S, pp. 30–37 (1995)

3. Bertsekas, D.P.: A counterexample to temporal differences learning. Neural Computation 7, 270–279 (1995)

4. Bertsekas, D.P.: Approximate dynamic programming. In: Dynamic Programming and Optimal Control, Ch. 6, vol. 2 (2010), http://web.mit.edu/dimitrib/www/dpchapter.html

5. Bertsekas, D.P.: Approximate policy iteration: A survey and some new methods. Journal of Control Theory and Applications 9(3), 310–335 (2011a)

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A concentration bound for LSPE(λ);Systems & Control Letters;2023-01

2. Approximate Dynamic Programming and Reinforcement Learning for Continuous States;EURO Advanced Tutorials on Operational Research;2021

3. Introduction;Multi‐Agent Coordination;2020-11-06

4. Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage;INFOR: Information Systems and Operational Research;2019-06-19

5. Bellman residuals minimization using online support vector machines;Applied Intelligence;2017-04-18