Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences

Author:

Bıyık Erdem1ORCID,Losey Dylan P.2,Palan Malayandi2,Landolfi Nicholas C.2,Shevchuk Gleb2,Sadigh Dorsa12

Affiliation:

1. Department of Electrical Engineering, Stanford University, Stanford, CA, USA

2. Department of Computer Science, Stanford University, Stanford, CA, USA

Abstract

Reward functions are a common way to specify the objective of a robot. As designing reward functions can be extremely challenging, a more promising approach is to directly learn reward functions from human teachers. Importantly, data from human teachers can be collected either passively or actively in a variety of forms: passive data sources include demonstrations (e.g., kinesthetic guidance), whereas preferences (e.g., comparative rankings) are actively elicited. Prior research has independently applied reward learning to these different data sources. However, there exist many domains where multiple sources are complementary and expressive. Motivated by this general problem, we present a framework to integrate multiple sources of information, which are either passively or actively collected from human users. In particular, we present an algorithm that first utilizes user demonstrations to initialize a belief about the reward function, and then actively probes the user with preference queries to zero-in on their true reward. This algorithm not only enables us combine multiple data sources, but it also informs the robot when it should leverage each type of information. Further, our approach accounts for the human’s ability to provide data: yielding user-friendly preference queries which are also theoretically optimal. Our extensive simulated experiments and user studies on a Fetch mobile manipulator demonstrate the superiority and the usability of our integrated framework.

Funder

FLI

toyota research institute

NSF

Publisher

SAGE Publications

Subject

Applied Mathematics,Artificial Intelligence,Electrical and Electronic Engineering,Mechanical Engineering,Modeling and Simulation,Software

Cited by 18 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback;AIAA SCITECH 2024 Forum;2024-01-04

2. Active preference-based Gaussian process regression for reward learning and optimization;The International Journal of Robotics Research;2023-11-07

3. Reward Learning With Intractable Normalizing Functions;IEEE Robotics and Automation Letters;2023-11

4. VARIQuery: VAE Segment-Based Active Learning for Query Selection in Preference-Based Reinforcement Learning;2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS);2023-10-01

5. Bootstrapping Adaptive Human-Machine Interfaces with Offline Reinforcement Learning;2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS);2023-10-01

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3