Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes-Reference-Cited by-同舟云学术

Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes

Published:2018-04-26 Issue:1 Volume:32 Page:
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Wachi Akifumi,Sui Yanan,Yue Yisong,Ono Masahiro

Abstract

We present a reinforcement learning approach to explore and optimize a safety-constrained Markov Decision Process(MDP). In this setting, the agent must maximize discounted cumulative reward while constraining the probability of entering unsafe states, defined using a safety function being within some tolerance. The safety values of all states are not known a priori, and we probabilistically model them via aGaussian Process (GP) prior. As such, properly behaving in such an environment requires balancing a three-way trade-off of exploring the safety function, exploring the reward function, and exploiting acquired knowledge to maximize reward. We propose a novel approach to balance this trade-off. Specifically, our approach explores unvisited states selectively; that is, it prioritizes the exploration of a state if visiting that state significantly improves the knowledge on the achievable cumulative reward. Our approach relies on a novel information gain criterion based on Gaussian Process representations of the reward and safety functions. We demonstrate the effectiveness of our approach on a range of experiments, including a simulation using the real Martian terrain data.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Planning under uncertainty for safe robot exploration using Gaussian process prediction;Autonomous Robots;2024-08-28

2. MoDem-V2: Visuo-Motor World Models for Real-World Robot Manipulation;2024 IEEE International Conference on Robotics and Automation (ICRA);2024-05-13

3. Real-Time Optimization of Fuel Cell Cogeneration Systems with Safety-Aware Self-Learning Algorithms;2024 UKACC 14th International Conference on Control (CONTROL);2024-04-10

4. Concurrent Learning of Control Policy and Unknown Safety Specifications in Reinforcement Learning;IEEE Open Journal of Control Systems;2024

5. A Computationally Lightweight Safe Learning Algorithm;2023 62nd IEEE Conference on Decision and Control (CDC);2023-12-13