Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human-Robot Interaction-Reference-Cited by-同舟云学术

Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human-Robot Interaction

Published:2023-09-22 Issue: Volume: Page:
ISSN:2573-9522
Container-title:ACM Transactions on Human-Robot Interaction
language:en
Short-container-title:J. Hum.-Robot Interact.

Author:

Mehta Shaunak A.¹,Losey Dylan P.¹

Affiliation:

1. Virginia Tech Department of Mechanical Engineering, USA

Abstract

Humans can leverage physical interaction to teach robot arms. This physical interaction takes multiple forms depending on the task, the user, and what the robot has learned so far. State-of-the-art approaches focus on learning from a single modality, or combine some interaction types. Some methods do so by assuming that the robot has prior information about the features of the task and the reward structure. By contrast, in this paper we introduce an algorithmic formalism that unites learning from demonstrations, corrections, and preferences. Our approach makes no assumptions about the tasks the human wants to teach the robot; instead, we learn a reward model from scratch by comparing the human’s input to nearby alternatives, i.e., trajectories close to the human’s feedback. We first derive a loss function that trains an ensemble of reward models to match the human’s demonstrations, corrections, and preferences. The type and order of feedback is up to the human teacher: we enable the robot to collect this feedback passively or actively. We then apply constrained optimization to convert our learned reward into a desired robot trajectory. Through simulations and a user study we demonstrate that our proposed approach more accurately learns manipulation tasks from physical human interaction than existing baselines, particularly when the robot is faced with new or unexpected objectives. Videos of our user study are available at: https://youtu.be/FSUJsTYvEKU

Publisher

Association for Computing Machinery (ACM)

Subject

Artificial Intelligence,Human-Computer Interaction

Link

https://dl.acm.org/doi/pdf/10.1145/3623384

Reference50 articles.

1. Pieter Abbeel and Andrew Y Ng . 2004 . Apprenticeship learning via inverse reinforcement learning . In International Conference on Machine Learning. Pieter Abbeel and Andrew Y Ng. 2004. Apprenticeship learning via inverse reinforcement learning. In International Conference on Machine Learning.

2. Keyframe-based Learning from Demonstration

3. A survey of robot learning from demonstration

4. Erdem Bıyık , Dylan P Losey , Malayandi Palan , Nicholas C Landolfi , Gleb Shevchuk , and Dorsa Sadigh . 2021. Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences. The International Journal of Robotics Research ( 2021 ). Erdem Bıyık, Dylan P Losey, Malayandi Palan, Nicholas C Landolfi, Gleb Shevchuk, and Dorsa Sadigh. 2021. Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences. The International Journal of Robotics Research (2021).

5. Quantifying Hypothesis Space Misspecification in Learning From Human–Robot Demonstrations and Physical Corrections

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. StROL: Stabilized and Robust Online Learning From Humans;IEEE Robotics and Automation Letters;2024-03

2. Safely and autonomously cutting meat with a collaborative robot arm;Scientific Reports;2024-01-02