Optimal Greedy Control in Reinforcement Learning-Reference-Cited by-同舟云学术

Optimal Greedy Control in Reinforcement Learning

Published:2022-11-18 Issue:22 Volume:22 Page:8920
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Gorobtsov Alexander^ORCID,Sychev Oleg^ORCID,Orlova Yulia^ORCID,Smirnov Evgeniy^ORCID,Grigoreva Olga^ORCID,Bochkin Alexander^ORCID,Andreeva Marina^ORCID

Abstract

We consider the problem of dimensionality reduction of state space in the variational approach to the optimal control problem, in particular, in the reinforcement learning method. The control problem is described by differential algebraic equations consisting of nonlinear differential equations and algebraic constraint equations interconnected with Lagrange multipliers. The proposed method is based on changing the Lagrange multipliers of one subset based on the Lagrange multipliers of another subset. We present examples of the application of the proposed method in robotics and vibration isolation in transport vehicles. The method is implemented in FRUND—a multibody system dynamics software package.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/22/22/8920/pdf

Reference45 articles.

1. Bellman, R. (2010). Dynamic Programming, Princeton University Press. Princeton Landmarks in Mathematics and Physics.

2. Pontryagin, L. (1987). Mathematical Theory of Optimal Processes, Taylor & Francis. Classics of Soviet Mathematics.

3. Heess, N., Dhruva, T., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y., Erez, T., Wang, Z., and Eslami, S.M.A. (2017). Emergence of Locomotion Behaviours in Rich Environments. arXiv.

4. Tassa, Y., Erez, T., and Todorov, E. (2012, January 7–12). Synthesis and stabilization of complex behaviors through online trajectory optimization. Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2012, Vilamoura, Algarve, Portugal.

5. Schulman, J., Moritz, P., Levine, S., Jordan, M.I., and Abbeel, P. (2016, January 2–4). High-Dimensional Continuous Control Using Generalized Advantage Estimation. Proceedings of the 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico.