Provably Efficient Reinforcement Learning with Linear Function Approximation-Reference-Cited by-同舟云学术

Provably Efficient Reinforcement Learning with Linear Function Approximation

Published:2023-08 Issue:3 Volume:48 Page:1496-1521
ISSN:0364-765X
Container-title:Mathematics of Operations Research
language:en
Short-container-title:Mathematics of OR

Author:

Jin Chi¹^ORCID,Yang Zhuoran²^ORCID,Wang Zhaoran³^ORCID,Jordan Michael I.⁴^ORCID

Affiliation:

1. Princeton University, Princeton, New Jersey 08544;

2. Yale University, New Haven, Connecticut 06520;

3. Northwestern University, Evanston, Illinois 60208;

4. University of California, Berkeley, Berkeley, California 94720

Abstract

Modern reinforcement learning (RL) is commonly applied to practical problems with an enormous number of states, where function approximation must be deployed to approximate either the value function or the policy. The introduction of function approximation raises a fundamental set of challenges involving computational and statistical efficiency, especially given the need to manage the exploration/exploitation trade-off. As a result, a core RL question remains open: how can we design provably efficient RL algorithms that incorporate function approximation? This question persists even in a basic setting with linear dynamics and linear rewards, for which only linear function approximation is needed. This paper presents the first provable RL algorithm with both polynomial run time and polynomial sample complexity in this linear setting, without requiring a “simulator” or additional assumptions. Concretely, we prove that an optimistic modification of least-squares value iteration—a classical algorithm frequently studied in the linear setting—achieves [Formula: see text] regret, where d is the ambient dimension of feature space, H is the length of each episode, and T is the total number of steps. Importantly, such regret is independent of the number of states and actions. Funding: This work was supported by the Defense Advanced Research Projects Agency program on Lifelong Learning Machines.

Publisher

Institute for Operations Research and the Management Sciences (INFORMS)

Subject

Management Science and Operations Research,Computer Science Applications,General Mathematics

Link

https://pubsonline.informs.org/doi/pdf/10.1287/moor.2022.1309

Reference23 articles.

1. Linear Least-Squares algorithms for temporal difference learning

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Provably Efficient Offline Reinforcement Learning With Trajectory-Wise Reward;IEEE Transactions on Information Theory;2024-09

2. High-Probability Sample Complexities for Policy Evaluation With Linear Function Approximation;IEEE Transactions on Information Theory;2024-08

3. Delayed MDPs with Feature Mapping;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30

4. Uncertainty-Aware Rank-One MIMO Q Network Framework for Accelerated Offline Reinforcement Learning;IEEE Access;2024

5. Efficient Incremental Offline Reinforcement Learning With Sparse Broad Critic Approximation;IEEE Transactions on Systems, Man, and Cybernetics: Systems;2024-01