Q-Learning with the Variable Box Method: A Case Study to Land a Solid Rocket-Reference-Cited by-同舟云学术

Q-Learning with the Variable Box Method: A Case Study to Land a Solid Rocket

Published:2023-02-02 Issue:2 Volume:11 Page:214
ISSN:2075-1702
Container-title:Machines
language:en
Short-container-title:Machines

Author:

Tevera-Ruiz Alejandro¹^ORCID,Garcia-Rodriguez Rodolfo²^ORCID,Parra-Vega Vicente¹^ORCID,Ramos-Velasco Luis Enrique²^ORCID

Affiliation:

1. Robotics and Advanced Manufacturing Department, Research Center for Advanced Studies (CINVESTAV), Ramos Arizpe 25900, Mexico

2. Aeronautical Engineering Program and Postgraduate Program in Aerospace Engineering, Univ. Politécnica Metropolitana de Hidalgo, Tolcayuca 43860, Mexico

Abstract

Some critical tasks require refined actions near the target, for instance, steering a car in a crowded parking lot or landing a rocket. These tasks are critical because failure to comply with the constraints near the target may lead to a fatal (unrecoverable) condition. Thus, a higher resolution action is required near the target to increase maneuvering precision. Moreover, completing the task becomes more challenging if the environment changes or is uncertain. Therefore, novel approaches have been proposed for these problems. In particular, reinforcement learning schemes such as Q-learning have been suggested to learn from scratch, subject to exploring action–state causal relationships aimed at action decisions that lead to an increase in the reward. Q-learning refines iterative action inputs by exploring state spaces that maximize the reward. However, reducing the (constant) resolution box needed for critical tasks increases the computational load, which may lead to the tantamount curse of the dimensionality problem. This paper proposes a variable box method to maintain a low number of boxes but reduce its resolution only near the target to increase action resolution as needed. The proposal is applied to a critical task such as landing a solid rocket, whose dynamics are highly nonlinear, underactuated, non-affine, and subject to environmental disturbances. Simulations show successful landing without leading to a curse of dimensionality, typical of the classical (constant box) Q-learning scheme.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Industrial and Manufacturing Engineering,Control and Optimization,Mechanical Engineering,Computer Science (miscellaneous),Control and Systems Engineering

Link

https://www.mdpi.com/2075-1702/11/2/214/pdf

Reference19 articles.

1. Reusable Space Planes Challenges and Control Problems;Nebylov;IFAC-PapersOnLine,2016

2. Design and Implementation of a Thrust Vector Control (TVC) Test System;Yaman;J. Polytech. Politek. Derg.,2018

3. Oates, G.C. (1997). Aerothermodynamics of Gas Turbine and Rocket Propulsion, American Institute of Aeronautics and Astronautics. [3rd ed.].

4. Chen, Y., and Ma, L. (2019, January 19–21). Rocket Powered Landing Guidance Using Proximal Policy Optimization. Proceedings of the 4th International Conference on Automation, Control and Robotics Engineering, Shenzhen, China.

5. Yuan, H., Zhao, Y., Mou, Y., and Wang, X. (2021, January 22–24). Leveraging Curriculum Reinforcement Learning for Rocket Powered Landing Guidance and Control. Proceedings of the China Automation Congress, Beijing, China.