Abstract
AbstractIn this paper, we will deal with a linear quadratic optimal control problem with unknown dynamics. As a modeling assumption, we will suppose that the knowledge that an agent has on the current system is represented by a probability distribution $$\pi $$
π
on the space of matrices. Furthermore, we will assume that such a probability measure is opportunely updated to take into account the increased experience that the agent obtains while exploring the environment, approximating with increasing accuracy the underlying dynamics. Under these assumptions, we will show that the optimal control obtained by solving the “average” linear quadratic optimal control problem with respect to a certain $$\pi $$
π
converges to the optimal control driven related to the linear quadratic optimal control problem governed by the actual, underlying dynamics. This approach is closely related to model-based reinforcement learning algorithms where prior and posterior probability distributions describing the knowledge on the uncertain system are recursively updated. In the last section, we will show a numerical test that confirms the theoretical results.
Funder
Università degli Studi di Roma La Sapienza
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Control and Optimization,Signal Processing,Control and Systems Engineering
Reference30 articles.
1. Atkeson CG, Santamaria JC (1997) A comparison of direct and model-based reinforcement learning. In: Proceedings of international conference on robotics and automation, vol 4, pp 3557–3564. https://doi.org/10.1109/ROBOT.1997.606886
2. Bettiol P, Khalil N (2019) Necessary optimality conditions for average cost minimization problems. Discrete Continuous Dyn Syst B 24(5):2093
3. Chowdhary G, Kingravi HA, How JP, Vela PA (2013) A Bayesian nonparametric approach to adaptive control using Gaussian processes. In: CDC, IEEE, pp 874–879
4. Chua K, Calandra R, McAllister R, Levine S (2018) Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In: Advances in neural information processing systems, vol 2018-December, pp 4754–4765
5. Coppel WA (1975) 18.-Linear-quadratic optimal control. Proc R Soc Edinb Secti A Math 73:271–289
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献