Safe Reinforcement Learning for Arm Manipulation with Constrained Markov Decision Process-Reference-Cited by-同舟云学术

Safe Reinforcement Learning for Arm Manipulation with Constrained Markov Decision Process

Published:2024-04-18 Issue:4 Volume:13 Page:63
ISSN:2218-6581
Container-title:Robotics
language:en
Short-container-title:Robotics

Author:

Adjei Patrick¹^ORCID,Tasfi Norman¹,Gomez-Rosero Santiago¹^ORCID,Capretz Miriam A. M.¹^ORCID

Affiliation:

1. Electrical and Computer Engineering, Western University, London, ON N6A 3K7, Canada

Abstract

In the world of human–robot coexistence, ensuring safe interactions is crucial. Traditional logic-based methods often lack the intuition required for robots, particularly in complex environments where these methods fail to account for all possible scenarios. Reinforcement learning has shown promise in robotics due to its superior adaptability over traditional logic. However, the exploratory nature of reinforcement learning can jeopardize safety. This paper addresses the challenges in planning trajectories for robotic arm manipulators in dynamic environments. In addition, this paper highlights the pitfalls of multiple reward compositions that are susceptible to reward hacking. A novel method with a simplified reward and constraint formulation is proposed. This enables the robot arm to avoid a nonstationary obstacle that never resets, enhancing operational safety. The proposed approach combines scalarized expected returns with a constrained Markov decision process through a Lagrange multiplier, resulting in better performance. The scalarization component uses the indicator cost function value, directly sampled from the replay buffer, as an additional scaling factor. This method is particularly effective in dynamic environments where conditions change continually, as opposed to approaches relying solely on the expected cost scaled by a Lagrange multiplier.

Funder

Natural Sciences and Engineering Research Council

Publisher

MDPI AG

Link

https://www.mdpi.com/2218-6581/13/4/63/pdf

Reference43 articles.

1. Colgate, E., Bicchi, A., Peshkin, M.A., and Colgate, J.E. (2008). Springer Handbook of Robotics, Springer.

2. Beetz, M., Chatila, R., Hertzberg, J., and Pecora, F. (2016). AI Reasoning Methods for Robotics, Springer.

3. Deliberation for autonomous robots: A survey;Ingrand;Artif. Intell.,2017

4. Human-level control through deep reinforcement learning;Mnih;Nature,2015

5. Hasselt, H.V., Guez, A., and Silver, D. (2016, January 12–17). Deep Reinforcement Learning with Double Q-Learning. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, Phoenix, AZ, USA.