Policy gradient methods for discrete time linear quadratic regulator with random parameters-Reference-Cited by-同舟云学术

Policy gradient methods for discrete time linear quadratic regulator with random parameters

Published:2024 Issue: Volume:30 Page:26
ISSN:1292-8119
Container-title:ESAIM: Control, Optimisation and Calculus of Variations
language:
Short-container-title:ESAIM: COCV

Author:

Li Deyue

Abstract

This paper studies an infinite horizon optimal control problem for discrete-time linear system and quadratic criteria, both with random parameters which are independent and identically distributed with respect to time. In this general setting, we apply the policy gradient method, a reinforcement learning technique, to search for the optimal control without requiring knowledge of statistical information of the parameters. We investigate the sub-Gaussianity of the state process and establish global linear convergence guarantee for this approach based on assumptions that are weaker and easier to verify compared to existing results. Numerical experiments are presented to illustrate our result.

Publisher

EDP Sciences

Link

https://www.esaim-cocv.org/10.1051/cocv/2024014/pdf

Reference20 articles.

1. Kalman R.E., Control of randomly varying linear dynamical systems. Proc. Symposia Appl. Math. (1961) 287–298.

2. The equivalent discrete-time optimal control problem for continuous-time systems with stochastic parameters

3. Optimal control of linear plants with random parameters

4. Aoki M., Optimal Control and System Theory in Dynamic Economic Analysis. Vol. 1 of A Series of Volumes in Dynamic Economics: Theory and Applications. North Holland Publishing Company (1976).

5. The uncertainty threshold principle: Some fundamental limitations of optimal decision making under dynamic uncertainty