Improving the efficiency of reinforcement learning for a spacecraft powered descent with Q-learning-Reference-Cited by-同舟云学术

Improving the efficiency of reinforcement learning for a spacecraft powered descent with Q-learning

Published:2021-10-04 Issue: Volume: Page:
ISSN:1389-4420
Container-title:Optimization and Engineering
language:en
Short-container-title:Optim Eng

Author:

Wilson Callum^ORCID,Riccardi Annalisa^ORCID

Abstract

AbstractReinforcement learning entails many intuitive and useful approaches to solving various problems. Its main premise is to learn how to complete tasks by interacting with the environment and observing which actions are more optimal with respect to a reward signal. Methods from reinforcement learning have long been applied in aerospace and have more recently seen renewed interest in space applications. Problems in spacecraft control can benefit from the use of intelligent techniques when faced with significant uncertainties—as is common for space environments. Solving these control problems using reinforcement learning remains a challenge partly due to long training times and sensitivity in performance to hyperparameters which require careful tuning. In this work we seek to address both issues for a sample spacecraft control problem. To reduce training times compared to other approaches, we simplify the problem by discretising the action space and use a data-efficient algorithm to train the agent. Furthermore, we employ an automated approach to hyperparameter selection which optimises for a specified performance metric. Our approach is tested on a 3-DOF powered descent problem with uncertainties in the initial conditions. We run experiments with two different problem formulations—using a ‘shaped’ state representation to guide the agent and also a ‘raw’ state representation with unprocessed values of position, velocity and mass. The results show that an agent can learn a near-optimal policy efficiently by appropriately defining the action-space and state-space. Using the raw state representation led to ‘reward-hacking’ and poor performance, which highlights the importance of the problem and state-space formulation in successfully training reinforcement learning agents. In addition, we show that the optimal hyperparameters can vary significantly based on the choice of loss function. Using two sets of hyperparameters optimised for different loss functions, we demonstrate that in both cases the agent can find near-optimal policies with comparable performance to previously applied methods.

Publisher

Springer Science and Business Media LLC

Subject

Electrical and Electronic Engineering,Control and Optimization,Mechanical Engineering,Aerospace Engineering,Civil and Structural Engineering,Software

Link

https://link.springer.com/content/pdf/10.1007/s11081-021-09687-z.pdf

Reference60 articles.

1. Acikmese B, Ploen SR (2007) Convex programming approach to powered descent guidance for mars landing. J Guid Control Dyn 30(5):1353–1366. https://doi.org/10.2514/1.27553

2. Acikmese B, Carson JM, Blackmore L (2013) Lossless convexification of nonconvex control bound and pointing constraints of the soft landing optimal control problem. IEEE Trans Control Syst Technol 21(6):2104–2113. https://doi.org/10.1109/TCST.2012.2237346

3. Barsce JC, Palombarini JA, Martinez EC: Towards autonomous reinforcement learning: Automatic setting of hyper-parameters using Bayesian optimization. In: 2017 43rd Latin American Computer Conference, CLEI 2017, vol 2017. Institute of Electrical and Electronics Engineers Inc, pp 1–9 (2017)

4. Barto AG, Sutton RS, Anderson CW (1983) Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans Syst Cybernet SMC–13(5):834–846

5. Battin RH (1999) An introduction to the mathematics and methods of astrodynamics, Revised Edition. American Institute of Aeronautics and Astronautics. https://doi.org/10.2514/4.861543

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Integrated entry guidance with no-fly zone constraint using reinforcement learning and predictor-corrector technique;Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering;2024-03-01

2. Comparative Study on Vibration Control Using Reinforcement Learning;2023 10th International Conference on Recent Advances in Air and Space Technologies (RAST);2023-06-07

3. A preface to the special issue on optimization in space engineering;Optimization and Engineering;2022-12-28

4. Enabling intelligent onboard guidance, navigation, and control using reinforcement learning on near-term flight hardware;Acta Astronautica;2022-10

5. Reinforcement learning in spacecraft control applications: Advances, prospects, and challenges;Annual Reviews in Control;2022