A Unified Lyapunov Framework for Finite-Sample Analysis of Reinforcement Learning Algorithms-Reference-Cited by-同舟云学术

A Unified Lyapunov Framework for Finite-Sample Analysis of Reinforcement Learning Algorithms

Published:2022-12-30 Issue:3 Volume:50 Page:12-15
ISSN:0163-5999
Container-title:ACM SIGMETRICS Performance Evaluation Review
language:en
Short-container-title:SIGMETRICS Perform. Eval. Rev.

Author:

Chen Zaiwei¹

Affiliation:

1. Caltech CMS

Abstract

Reinforcement learning (RL) is a paradigm where an agent learns to accomplish tasks by interacting with the environment, similar to how humans learn. RL is therefore viewed as a promising approach to achieve artificial intelligence, as evidenced by the remarkable empirical successes. However, many RL algorithms are theoretically not well-understood, especially in the setting where function approximation and off-policy sampling are employed. My thesis [1] aims at developing thorough theoretical understanding to the performance of various RL algorithms through finite-sample analysis.Since most of the RL algorithms are essentially stochastic approximation (SA) algorithms for solving variants of the Bellman equation, the first part of thesis is dedicated to the analysis of general SA involving a contraction operator, and under Markovian noise. We develop a Lyapunov approach where we construct a novel Lyapunov function called the generaled Moreau envelope. The results on SA enable us to establish finite-sample bounds of various RL algorithms in the tabular setting (cf. Part II of the thesis) and when using function approximation (cf. Part III of the thesis), which in turn provide theoretical insights to several important problems in the RL community, such as the efficiency of bootstrapping, the bias-variance trade-off in off-policy learning, and the stability of off-policy control.The main body of this document provides an overview of the contributions of my thesis.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3579342.3579346

Reference16 articles.

1. Zaiwei Chen. "A Unified Lyapunov Framework for Finite- Sample Analysis of Reinforcement Learning Algorithms". In: Ph.D. thesis Georgia Institute of Technology (2022). Zaiwei Chen. "A Unified Lyapunov Framework for Finite- Sample Analysis of Reinforcement Learning Algorithms". In: Ph.D. thesis Georgia Institute of Technology (2022).

2. Target Network and Truncation Overcome The Deadly triad in Q-Learning;Chen Zaiwei;Major revision at SIAM Journal on Mathematics of Data Science (,2022

3. Finite-Sample Analysis of Off-Policy Natural Actor–Critic With Linear Function Approximation

4. Zaiwei Chen and Siva Theja Maguluri . " Sample Complexity of Policy-Based Methods under Off-Policy Sampling and Linear Function Approximation". In: International Conference on Artificial Intelligence and Statistics. PMLR. 2022 , pp. 11195 -- 11214 . Zaiwei Chen and Siva Theja Maguluri. "Sample Complexity of Policy-Based Methods under Off-Policy Sampling and Linear Function Approximation". In: International Conference on Artificial Intelligence and Statistics. PMLR. 2022, pp. 11195--11214.

5. Stationary Behavior of Constant Stepsize SGD Type Algorithms

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Unified Lyapunov Framework for Finite-Sample Analysis of Reinforcement Learning Algorithms;ACM SIGMETRICS Performance Evaluation Review;2022-12-30