Penalized Proximal Policy Optimization for Safe Reinforcement Learning-Reference-Cited by-同舟云学术

Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Published:2022-07 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
language:
Short-container-title:

Author:

Zhang Linrui¹,Shen Li²,Yang Long³,Chen Shixiang²,Wang Xueqian¹,Yuan Bo¹,Tao Dacheng²

Affiliation:

1. Tsinghua University

2. JD Explore Academy

3. Peking University

Abstract

Safe reinforcement learning aims to learn the optimal policy while satisfying safety constraints, which is essential in real-world applications. However, current algorithms still struggle for efficient policy updates with hard constraint satisfaction. In this paper, we propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem. Specifically, P3O utilizes a simple yet effective penalty approach to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective. We theoretically prove the exactness of the penalized method with a finite penalty factor and provide a worst-case analysis for approximate error when evaluated on sample trajectories. Moreover, we extend P3O to more challenging multi-constraint and multi-agent scenarios which are less studied in previous work. Extensive experiments show that P3O outperforms state-of-the-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.

Publisher

International Joint Conferences on Artificial Intelligence Organization

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Dual Behavior Regularized Offline Deterministic Actor–Critic;IEEE Transactions on Systems, Man, and Cybernetics: Systems;2024-08

2. Integrating Risk-Averse and Constrained Reinforcement Learning for Robust Decision-Making in High-Stakes Scenarios;Mathematics;2024-06-24

3. Adaptive Curriculum Learning With Successor Features for Imbalanced Compositional Reward Functions;IEEE Robotics and Automation Letters;2024-06

4. Cognitive intelligence in industrial robots and manufacturing;Computers & Industrial Engineering;2024-05

5. Dynamic Routing for Integrated Satellite-Terrestrial Networks: A Constrained Multi-Agent Reinforcement Learning Approach;IEEE Journal on Selected Areas in Communications;2024-05