<i>Here’s What I’ve Learned:</i> Asking Questions that Reveal Reward Learning-Reference-Cited by-同舟云学术

Here’s What I’ve Learned: Asking Questions that Reveal Reward Learning

Published:2022-09-08 Issue:4 Volume:11 Page:1-28
ISSN:2573-9522
Container-title:ACM Transactions on Human-Robot Interaction
language:en
Short-container-title:J. Hum.-Robot Interact.

Author:

Habibian Soheil¹^ORCID,Jonnavittula Ananth¹^ORCID,Losey Dylan P.¹^ORCID

Affiliation:

1. Virginia Tech, Blacksburg, VA

Abstract

Robots can learn from humans by asking questions. In these questions, the robot demonstrates a few different behaviors and asks the human for their favorite. But how should robots choose which questions to ask? Today’s robots optimize for informative questions that actively probe the human’s preferences as efficiently as possible. But while informative questions make sense from the robot’s perspective, human onlookers may find them arbitrary and misleading . For example, consider an assistive robot learning to put away the dishes. Based on your answers to previous questions this robot knows where it should stack each dish; however, the robot is unsure about right height to carry these dishes. A robot optimizing only for informative questions focuses purely on this height: it shows trajectories that carry the plates near or far from the table, regardless of whether or not they stack the dishes correctly. As a result, when we see this question, we mistakenly think that the robot is still confused about where to stack the dishes! In this article, we formalize active preference-based learning from the human’s perspective. We hypothesize that—from the human’s point-of-view —the robot’s questions reveal what the robot has and has not learned. Our insight enables robots to use questions to make their learning process transparent to the human operator. We develop and test a model that robots can leverage to relate the questions they ask to the information these questions reveal. We then introduce a tradeoff between informative and revealing questions that considers both human and robot perspectives: a robot that optimizes for this tradeoff actively gathers information from the human while simultaneously keeping the human up to date with what it has learned. We evaluate our approach across simulations, online surveys, and in-person user studies. We find that robots, which consider the human’s point of view learn just as quickly as state-of-the-art baselines while also communicating what they have learned to the human operator. Videos of our user studies and results are available here: https://youtu.be/tC6y_jHN7Vw.

Publisher

Association for Computing Machinery (ACM)

Subject

Artificial Intelligence,Human-Computer Interaction

Link

https://dl.acm.org/doi/pdf/10.1145/3526107

Reference53 articles.

1. Apprenticeship learning via inverse reinforcement learning

2. Keyframe-based Learning from Demonstration

3. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI

4. Andrea Bajcsy, Dylan P. Losey, Marcia K. O’Malley, and Anca D. Dragan. 2018. Learning from physical human corrections, one feature at a time. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction. 141–149.

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. LIMIT: Learning Interfaces to Maximize Information Transfer;ACM Transactions on Human-Robot Interaction;2024-08-03

2. FARPLS: A Feature-Augmented Robot Trajectory Preference Labeling System to Assist Human Labelers’ Preference Elicitation;Proceedings of the 29th International Conference on Intelligent User Interfaces;2024-03-18

3. Scalarizing Multi-Objective Robot Planning Problems Using Weighted Maximization;IEEE Robotics and Automation Letters;2024-03

4. Regret-Based Sampling of Pareto Fronts for Multiobjective Robot Planning Problems;IEEE Transactions on Robotics;2024

5. User Interface Interventions for Improving Robot Learning from Demonstration;International Conference on Human-Agent Interaction;2023-12-04