Convergence results for an averaged LQR problem with applications to reinforcement learning-Reference-Cited by-同舟云学术

Convergence results for an averaged LQR problem with applications to reinforcement learning

Published:2021-07-08 Issue:3 Volume:33 Page:379-411
ISSN:0932-4194
Container-title:Mathematics of Control, Signals, and Systems
language:en
Short-container-title:Math. Control Signals Syst.

Author:

Pesare Andrea^ORCID,Palladino Michele^ORCID,Falcone Maurizio^ORCID

Abstract

AbstractIn this paper, we will deal with a linear quadratic optimal control problem with unknown dynamics. As a modeling assumption, we will suppose that the knowledge that an agent has on the current system is represented by a probability distribution

$$\pi $$

π on the space of matrices. Furthermore, we will assume that such a probability measure is opportunely updated to take into account the increased experience that the agent obtains while exploring the environment, approximating with increasing accuracy the underlying dynamics. Under these assumptions, we will show that the optimal control obtained by solving the “average” linear quadratic optimal control problem with respect to a certain

$$\pi $$

π converges to the optimal control driven related to the linear quadratic optimal control problem governed by the actual, underlying dynamics. This approach is closely related to model-based reinforcement learning algorithms where prior and posterior probability distributions describing the knowledge on the uncertain system are recursively updated. In the last section, we will show a numerical test that confirms the theoretical results.

Funder

Università degli Studi di Roma La Sapienza

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Control and Optimization,Signal Processing,Control and Systems Engineering

Link

https://link.springer.com/content/pdf/10.1007/s00498-021-00294-y.pdf

Reference30 articles.

1. Atkeson CG, Santamaria JC (1997) A comparison of direct and model-based reinforcement learning. In: Proceedings of international conference on robotics and automation, vol 4, pp 3557–3564. https://doi.org/10.1109/ROBOT.1997.606886

2. Bettiol P, Khalil N (2019) Necessary optimality conditions for average cost minimization problems. Discrete Continuous Dyn Syst B 24(5):2093

3. Chowdhary G, Kingravi HA, How JP, Vela PA (2013) A Bayesian nonparametric approach to adaptive control using Gaussian processes. In: CDC, IEEE, pp 874–879

4. Chua K, Calandra R, McAllister R, Levine S (2018) Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In: Advances in neural information processing systems, vol 2018-December, pp 4754–4765

5. Coppel WA (1975) 18.-Linear-quadratic optimal control. Proc R Soc Edinb Secti A Math 73:271–289

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Optimal harvesting policy for biological resources with uncertain heterogeneity for application in fisheries management;Natural Resource Modeling;2024-01-31

2. Optimal control of ensembles of dynamical systems;ESAIM: Control, Optimisation and Calculus of Variations;2023

3. A New Algorithm for the LQR Problem with Partially Unknown Dynamics;Large-Scale Scientific Computing;2022