Stochastic linear quadratic optimal tracking control for discrete-time systems with delays based on Q-learning algorithm-Reference-Cited by-同舟云学术

Stochastic linear quadratic optimal tracking control for discrete-time systems with delays based on Q-learning algorithm

Published:2023 Issue:5 Volume:8 Page:10249-10265
ISSN:2473-6988
Container-title:AIMS Mathematics
language:
Short-container-title:MATH

Author:

Tan Xufeng¹,Li Yuan¹,Liu Yang²

Affiliation:

1. School of Science, Shenyang University of Technology, Shenyang 110870, China

2. School of Electrical and Electronic Engineering, Shenyang University of Technology, Shenyang 110870, China

Abstract

<abstract><p>In this paper, a reinforcement Q-learning method based on value iteration (Ⅵ) is proposed for a class of model-free stochastic linear quadratic (SLQ) optimal tracking problem with time delay. Compared with the traditional reinforcement learning method, Q-learning method avoids the need for accurate system model. Firstly, the delay operator is introduced to construct a novel augmented system composed of the original system and the command generator. Secondly, the SLQ optimal tracking problem is transformed into a deterministic one by system transformation and the corresponding Q function of SLQ optimal tracking control is derived. Based on this, Q-learning algorithm is proposed and its convergence is proved. Finally, a simulation example shows the effectiveness of the proposed algorithm.</p></abstract>

Publisher

American Institute of Mathematical Sciences (AIMS)

Subject

General Mathematics

Reference28 articles.

1. H. Modares, F. L. Lewis, Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning, Automatica, 50 (2014), 1780–1792. https://doi.org/10.1016/j.automatica.2014.05.011

2. B. Zhao, Y. Li, Model-free adaptive dynamic programming based near-optimal decentralized tracking control of reconfigurable manipulators, Int. J. Control, Autom. Syst., 16 (2018), 478–490. https://doi.org/10.1007/s12555-016-0711-5

3. T. Huang, D. Liu, A self-learning scheme for residential energy system control and management, Neural Comput. Appl., 22 (2013), 259–269. https://doi.org/10.1007/s00521-011-0711-6

4. M. Gluzman, J. G. Scott, A. Vladimirsky, Optimizing adaptive cancer therapy: dynamic programming and evolutionary game theory, Proc. Royal Soc. B: Biol. Sci., 287 (2020), 20192454. https://doi.org/10.1098/rspb.2019.2454

5. I. Ha, E. Gilbert, Robust tracking in nonlinear systems, IEEE Trans. Automat. Control, 32 (1987), 763–771. https://doi.org/10.1109/TAC.1987.1104710

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Data‐driven policy iteration algorithm for continuous‐time stochastic linear‐quadratic optimal control problems;Asian Journal of Control;2023-09-05