An approach to solving optimal control problems of nonlinear systems by introducing detail-reward mechanism in deep reinforcement learning-Reference-Cited by-同舟云学术

An approach to solving optimal control problems of nonlinear systems by introducing detail-reward mechanism in deep reinforcement learning

Published:2022 Issue:9 Volume:19 Page:9258-9290
ISSN:1551-0018
Container-title:Mathematical Biosciences and Engineering
language:
Short-container-title:MBE

Author:

Yao Shixuan¹,Liu Xiaochen²,Zhang Yinghui²,Cui Ze³

Affiliation:

1. School of Software Engineering, Dalian University of Foreign Languages, Dalian 116044, China

2. School of Mechanical Engineering, Dalian Jiaotong University, Dalian 116028, China

3. School of Control Science and Engineering, Dalian University of Technology, Dalian 116024, China

Abstract

<abstract> <p>In recent years, dynamic programming and reinforcement learning theory have been widely used to solve the nonlinear control system (NCS). Among them, many achievements have been made in the construction of network model and system stability analysis, but there is little research on establishing control strategy based on the detailed requirements of control process. Spurred by this trend, this paper proposes a detail-reward mechanism (DRM) by constructing the reward function composed of the individual detail evaluation functions in order to replace the utility function in the Hamilton-Jacobi-Bellman (HJB) equation. And this method is introduced into a wider range of deep reinforcement learning algorithms to solve optimization problems in NCS. After the mathematical description of the relevant characteristics of NCS, the stability of iterative control law is proved by Lyapunov function. With the inverted pendulum system as the experiment object, the dynamic environment is designed and the reward function is established by using the DRM. Finally, three deep reinforcement learning algorithm models are designed in the dynamic environment, which are based on Deep Q-Networks, policy gradient and actor-critic. The effects of different reward functions on the experimental accuracy are compared. The experimental results show that in NCS, using the DRM to replace the utility function in the HJB equation is more in line with the detailed requirements of the designer for the whole control process. By observing the characteristics of the system, designing the reward function and selecting the appropriate deep reinforcement learning algorithm model, the optimization problem of NCS can be solved.</p> </abstract>

Publisher

American Institute of Mathematical Sciences (AIMS)

Subject

Applied Mathematics,Computational Mathematics,General Agricultural and Biological Sciences,Modeling and Simulation,General Medicine

Reference61 articles.

1. J. Wu, W. Sun, S. F. Su, Y. Q. Wu, Adaptive quantized control for uncertain nonlinear systems with unknown control directions, Int. J. Robust Nonlinear Control, 31 (2021), 8658–8671. https://doi.org/10.1002/rnc.5748

2. A. Shatyrko, J. Diblík, D. Khusainov, M. Růžičková, Stabilization of Lur'e-type nonlinear control systems by Lyapunov-Krasovskii functionals, Adv. Diff. Equations, 2012 (2012), 1–9. https://doi.org/10.1186/1687-1847-2012-229

3. K. Tatsuya, Limit-cycle-like control for 2-dimensional discrete-time nonlinear control systems and its application to the Hénon map, Commun. Nonlinear Sci. Numer. Simul., 18 (2013), 171–183. https://doi.org/10.1016/j.cnsns.2012.06.012

4. Y. H. Wei, Lyapunov stability theory for nonlinear nabla fractional order systems, IEEE Trans. Circuits Sys., 68 (2021), 3246–3250. https://doi.org/10.1109/TCSII.2021.3063914

5. G. Pole, A. Girard, P. Tabuada, Approximately bisimilar symbolic models for nonlinear control systems, Automatica, 44 (2008), 2508–2516. https://doi.org/10.1016/j.automatica.2008.02.021

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Intelligent computational techniques for physical object properties discovery, detection, and prediction: A comprehensive survey;Computer Science Review;2024-02

2. Applications of Deep Learning for Drug Discovery Systems with BigData;BioMedInformatics;2022-11-12

3. Research on Solving Nonlinear Problem of Ball and Beam System by Introducing Detail-Reward Function;Symmetry;2022-09-08