Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach

Author:

Mourad Nafee1ORCID,Ezzeddine Ali2ORCID,Nadjar Araabi Babak2,Nili Ahmadabadi Majid1

Affiliation:

1. Cognitive Systems Laboratory, School of ECE, College of Engineering, University of Tehran, Tehran, Iran

2. Machine Learning and Computational Modeling Laboratory, School of ECE, College of Engineering, University of Tehran, Tehran, Iran

Abstract

Programming by demonstrations is one of the most efficient methods for knowledge transfer to develop advanced learning systems, provided that teachers deliver abundant and correct demonstrations, and learners correctly perceive them. Nevertheless, demonstrations are sparse and inaccurate in almost all real-world problems. Complementary information is needed to compensate these shortcomings of demonstrations. In this paper, we target programming by a combination of nonoptimal and sparse demonstrations and a limited number of binary evaluative feedbacks, where the learner uses its own evaluated experiences as new demonstrations in an extended inverse reinforcement learning method. This provides the learner with a broader generalization and less regret as well as robustness in face of sparsity and nonoptimality in demonstrations and feedbacks. Our method alleviates the unrealistic burden on teachers to provide optimal and abundant demonstrations. Employing an evaluative feedback, which is easy for teachers to deliver, provides the opportunity to correct the learner’s behavior in an interactive social setting without requiring teachers to know and use their own accurate reward function. Here, we enhance the inverse reinforcement learning (IRL) to estimate the reward function using a mixture of nonoptimal and sparse demonstrations and evaluative feedbacks. Our method, called IRL from demonstration and human’s critique (IRLDC), has two phases. The teacher first provides some demonstrations for the learner to initialize its policy. Next, the learner interacts with the environment and the teacher provides binary evaluative feedbacks. Taking into account possible inconsistencies and mistakes in issuing and receiving feedbacks, the learner revises the estimated reward function by solving a single optimization problem. The IRLDC is devised to handle errors and sparsities in demonstrations and feedbacks and can generalize different combinations of these two sources expertise. We apply our method to three domains: a simulated navigation task, a simulated car driving problem with human interactions, and a navigation experiment of a mobile robot. The results indicate that the IRLDC significantly enhances the learning process where the standard IRL methods fail and learning from feedbacks (LfF) methods has a high regret. Also, the IRLDC works well at different levels of sparsity and optimality of the teacher’s demonstrations and feedbacks, where other state-of-the-art methods fail.

Publisher

Hindawi Limited

Subject

General Computer Science,Control and Systems Engineering

Reference17 articles.

Cited by 14 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Data-Driven Policy Learning Methods from Biological Behavior: A Systematic Review;Applied Sciences;2024-05-09

2. Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores;2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS);2023-10-01

3. Calibrated Human-Robot Teaching: What People Do When Teaching Norms to Robots*;2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN);2023-08-28

4. Personalization of Hearing AID DSLV5 Prescription Amplification in the Field via a Real-Time Smartphone APP;2023 24th International Conference on Digital Signal Processing (DSP);2023-06-11

5. A Real- Time Smartphone App for Field Personalization of Hearing Enhancement by Adaptive Dynamic Range Optimization;2023 5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA);2023-06-08

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3