Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling-Reference-Cited by-同舟云学术

Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling

Published:2021-01-04 Issue:3 Volume:110 Page:559-618
ISSN:0885-6125
Container-title:Machine Learning
language:en
Short-container-title:Mach Learn

Author:

Prashanth L. A.^ORCID,Korda Nathaniel,Munos Rémi

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Software

Link

https://link.springer.com/content/pdf/10.1007/s10994-020-05912-5.pdf

Reference49 articles.

1. Antos, A., Szepesvári, C., & Munos, R. (2008). Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 71(1), 89–129.

2. Bach, F., & Moulines, E. (2011). Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In Advances in neural information processing systems (pp. 451–459).

3. Bach, F., & Moulines, E. (2013). Non-strongly-convex smooth stochastic approximation with convergence rate o (1/n). In Advances in neural information processing systems (pp. 773–781).

4. Bertsekas, D. P. (2012). Dynamic Programming and Optimal Control, Approximate Dynamic Programming, (4th ed., Vol. II). Belmont: Athena Scientific.

5. Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming (Optimization and Neural Computation Series, 3), (Vol. 7). Belmont: Athena Scientific.

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Reinforcement Learning Recommendation Algorithm Based on Label Value Distribution;Mathematics;2023-06-28

2. A concentration bound for LSPE(λ);Systems & Control Letters;2023-01

3. Concentration of Contractive Stochastic Approximation and Reinforcement Learning;Stochastic Systems;2022-12

4. N-SVRG: Stochastic Variance Reduction Gradient with Noise Reduction Ability for Small Batch Samples;Computer Modeling in Engineering & Sciences;2022

5. Risk-Sensitive Reinforcement Learning via Policy Gradient Search;Foundations and Trends® in Machine Learning;2022