Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences-Reference-Cited by-同舟云学术

Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences

Published:2021-08-28 Issue:1 Volume:41 Page:45-67
ISSN:0278-3649
Container-title:The International Journal of Robotics Research
language:en
Short-container-title:The International Journal of Robotics Research

Author:

Bıyık Erdem¹^ORCID,Losey Dylan P.²,Palan Malayandi²,Landolfi Nicholas C.²,Shevchuk Gleb²,Sadigh Dorsa¹²

Affiliation:

1. Department of Electrical Engineering, Stanford University, Stanford, CA, USA

2. Department of Computer Science, Stanford University, Stanford, CA, USA

Abstract

Reward functions are a common way to specify the objective of a robot. As designing reward functions can be extremely challenging, a more promising approach is to directly learn reward functions from human teachers. Importantly, data from human teachers can be collected either passively or actively in a variety of forms: passive data sources include demonstrations (e.g., kinesthetic guidance), whereas preferences (e.g., comparative rankings) are actively elicited. Prior research has independently applied reward learning to these different data sources. However, there exist many domains where multiple sources are complementary and expressive. Motivated by this general problem, we present a framework to integrate multiple sources of information, which are either passively or actively collected from human users. In particular, we present an algorithm that first utilizes user demonstrations to initialize a belief about the reward function, and then actively probes the user with preference queries to zero-in on their true reward. This algorithm not only enables us combine multiple data sources, but it also informs the robot when it should leverage each type of information. Further, our approach accounts for the human’s ability to provide data: yielding user-friendly preference queries which are also theoretically optimal. Our extensive simulated experiments and user studies on a Fetch mobile manipulator demonstrate the superiority and the usability of our integrated framework.

Funder

FLI

toyota research institute

NSF

Publisher

SAGE Publications

Subject

Applied Mathematics,Artificial Intelligence,Electrical and Electronic Engineering,Mechanical Engineering,Modeling and Simulation,Software

Link

http://journals.sagepub.com/doi/pdf/10.1177/02783649211041652

Reference60 articles.

1. Apprenticeship learning via inverse reinforcement learning