Author:
Buşoniu Lucian,Lazaric Alessandro,Ghavamzadeh Mohammad,Munos Rémi,Babuška Robert,De Schutter Bart
Publisher
Springer Berlin Heidelberg
Reference57 articles.
1. Antos, A., Szepesvári, C., Munos, R.: Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning 71(1), 89–129 (2008)
2. Baird, L.: Residual algorithms: Reinforcement learning with function approximation. In: Proceedings 12th International Conference on Machine Learning (ICML-1995), Tahoe City, U.S, pp. 30–37 (1995)
3. Bertsekas, D.P.: A counterexample to temporal differences learning. Neural Computation 7, 270–279 (1995)
4. Bertsekas, D.P.: Approximate dynamic programming. In: Dynamic Programming and Optimal Control, Ch. 6, vol. 2 (2010),
http://web.mit.edu/dimitrib/www/dpchapter.html
5. Bertsekas, D.P.: Approximate policy iteration: A survey and some new methods. Journal of Control Theory and Applications 9(3), 310–335 (2011a)
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献