Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance-Reference-Cited by-同舟云学术

Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance

Published:2020-04-03 Issue:04 Volume:34 Page:5109-5116
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Jing Mingxuan,Ma Xiaojian,Huang Wenbing,Sun Fuchun,Yang Chao,Fang Bin,Liu Huaping

Abstract

In this paper, we study Reinforcement Learning from Demonstrations (RLfD) that improves the exploration efficiency of Reinforcement Learning (RL) by providing expert demonstrations. Most of existing RLfD methods require demonstrations to be perfect and sufficient, which yet is unrealistic to meet in practice. To work on imperfect demonstrations, we first define an imperfect expert setting for RLfD in a formal way, and then point out that previous methods suffer from two issues in terms of optimality and convergence, respectively. Upon the theoretical findings we have derived, we tackle these two issues by regarding the expert guidance as a soft constraint on regulating the policy exploration of the agent, which eventually leads to a constrained optimization problem. We further demonstrate that such problem is able to be addressed efficiently by performing a local linear search on its dual form. Considerable empirical evaluations on a comprehensive collection of benchmarks indicate our method attains consistent improvement over other RLfD counterparts.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 25 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Learning from demonstration for autonomous generation of robotic trajectory: Status quo and forward-looking overview;Advanced Engineering Informatics;2024-10

2. Adversarial Batch Inverse Reinforcement Learning: Learn to Reward from Imperfect Demonstration for Interactive Recommendation;2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD);2024-05-08

3. An online hyper‐volume action bounding approach for accelerating the process of deep reinforcement learning from multiple controllers;Journal of Field Robotics;2024-04-28

4. Learning and Repair of Deep Reinforcement Learning Policies from Fuzz-Testing Data;Proceedings of the IEEE/ACM 46th International Conference on Software Engineering;2024-02-06

5. Imitation Learning Method of Multi-quality Expert Data Based on GAIL;2023 China Automation Congress (CAC);2023-11-17