Toward a Theoretical Foundation of Policy Optimization for Learning Control Policies-Reference-Cited by-同舟云学术

Toward a Theoretical Foundation of Policy Optimization for Learning Control Policies

Published:2023-05-03 Issue:1 Volume:6 Page:123-158
ISSN:2573-5144
Container-title:Annual Review of Control, Robotics, and Autonomous Systems
language:en
Short-container-title:Annu. Rev. Control Robot. Auton. Syst.

Author:

Hu Bin¹,Zhang Kaiqing²³,Li Na⁴,Mesbahi Mehran⁵,Fazel Maryam⁶,Başar Tamer¹

Affiliation:

1. Coordinated Science Laboratory and Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, Illinois, USA;,

2. Laboratory for Information and Decision Systems and Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA

3. Current affiliation: Department of Electrical and Computer Engineering and Institute for Systems Research, University of Maryland, College Park, Maryland, USA;

4. School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, USA;

5. Department of Aeronautics and Astronautics, University of Washington, Seattle, Washington, USA;

6. Department of Electrical and Computer Engineering, University of Washington, Seattle, Washington, USA;

Abstract

Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis that has been popularized by successes of reinforcement learning. We take an interdisciplinary perspective in our exposition that connects control theory, reinforcement learning, and large-scale optimization. We review a number of recently developed theoretical results on the optimization landscape, global convergence, and sample complexityof gradient-based methods for various continuous control problems, such as the linear quadratic regulator (LQR), [Formula: see text] control, risk-sensitive control, linear quadratic Gaussian (LQG) control, and output feedback synthesis. In conjunction with these optimization results, we also discuss how direct policy optimization handles stability and robustness concerns in learning-based control, two main desiderata in control engineering. We conclude the survey by pointing out several challenges and opportunities at the intersection of learning and control.

Publisher

Annual Reviews

Subject

Artificial Intelligence,Human-Computer Interaction,Engineering (miscellaneous),Control and Systems Engineering

Link

https://www.annualreviews.org/doi/pdf/10.1146/annurev-control-042920-020021

Reference161 articles.

1. Human-level control through deep reinforcement learning

2. Mastering the game of Go with deep neural networks and tree search

3. Mastering the game of Go without human knowledge

4. Rajeswaran A, Kumar V, Gupta A, Vezzani G, Schulman J, et al. 2017. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv:1709.10087 [cs.LG]

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. On the Optimization Landscape of Dynamic Output Feedback Linear Quadratic Control;IEEE Transactions on Automatic Control;2024-02

2. On the Sample Complexity of the Linear Quadratic Gaussian Regulator;2023 62nd IEEE Conference on Decision and Control (CDC);2023-12-13

3. Data-Enabled Policy Optimization for the Linear Quadratic Regulator;2023 62nd IEEE Conference on Decision and Control (CDC);2023-12-13

4. Neural Operators for Hyperbolic PDE Backstepping Feedback Laws;2023 62nd IEEE Conference on Decision and Control (CDC);2023-12-13

5. Natural Policy Gradient Preserves Spatial Decay Properties for Control of Networked Dynamical Systems;2023 62nd IEEE Conference on Decision and Control (CDC);2023-12-13