Run Time Assured Reinforcement Learning for Safe Satellite Docking-Reference-Cited by-同舟云学术

Run Time Assured Reinforcement Learning for Safe Satellite Docking

Published:2023-01 Issue:1 Volume:20 Page:25-36
ISSN:1940-3151
Container-title:Journal of Aerospace Information Systems
language:en
Short-container-title:Journal of Aerospace Information Systems

Author:

Dunlap Kyle¹^ORCID,Mote Mark²,Delsing Kaiden³,Hobbs Kerianne L.⁴

Affiliation:

1. Parallax Advanced Research, Beavercreek, Ohio 45431

2. Pytheia Corporation, Atlanta, Georgia 30308

3. Cedarville University, Cedarville, Ohio 45314

4. U.S. Air Force Research Laboratory, Dayton, Ohio 45431

Abstract

Reinforcement learning promises high performance in complex tasks as well as low online storage and computation cost. However, the trial-and-error learning approach of reinforcement learning could explore unsafe behavior in the search for an optimal solution. Run time assurance (RTA) approaches can be applied to monitor behavior and ensure safety constraint satisfaction during reinforcement learning. This paper investigates the effect of RTA on reinforcement learning training performance in terms of training efficiency, safety constraint satisfaction, control efficiency, task efficiency, and training duration. For the purposes of demonstration, a custom reinforcement learning environment is created where the objective is to develop a policy that moves a satellite into docking position with another satellite in a two-dimensional relative-motion reference frame. Six different policies are trained. The first features no RTA, the second features no RTA but a higher penalty for safety violations, and four others use different RTA techniques to enforce a dynamic velocity constraint during training. The trained policies are analyzed with standardized test points. It is shown that the policies trained without RTA do not learn to adhere to the constraint, whereas all policies trained with RTA do learn to adhere to the constraint. Although more complex RTA frameworks can be better for operational use, it is found that a simple RTA framework provides the best overall results for reinforcement learning training.

Funder

Space Research Institute for Discovery and Exploration Fellowship

Air Force Research Laboratory Innovation Pipeline Fund

Publisher

American Institute of Aeronautics and Astronautics (AIAA)

Subject

Electrical and Electronic Engineering,Computer Science Applications,Aerospace Engineering

Link

https://arc.aiaa.org/doi/pdf/10.2514/1.I011126

Reference27 articles.

1. Mastering the game of Go with deep neural networks and tree search

2. Mastering the game of Go without human knowledge