Efficient Reinforcement Learning from Demonstration via Bayesian Network-Based Knowledge Extraction

Author:

Zhang Yichuan1ORCID,Lan Yixing1ORCID,Fang Qiang1ORCID,Xu Xin1ORCID,Li Junxiang1ORCID,Zeng Yujun1ORCID

Affiliation:

1. College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China

Abstract

Reinforcement learning from demonstration (RLfD) is considered to be a promising approach to improve reinforcement learning (RL) by leveraging expert demonstrations as the additional decision-making guidance. However, most existing RLfD methods only regard demonstrations as low-level knowledge instances under a certain task. Demonstrations are generally used to either provide additional rewards or pretrain the neural network-based RL policy in a supervised manner, usually resulting in poor generalization capability and weak robustness performance. Considering that human knowledge is not only interpretable but also suitable for generalization, we propose to exploit the potential of demonstrations by extracting knowledge from them via Bayesian networks and develop a novel RLfD method called Reinforcement Learning from demonstration via Bayesian Network-based Knowledge (RLBNK). The proposed RLBNK method takes advantage of node influence with the Wasserstein distance metric (NIW) algorithm to obtain abstract concepts from demonstrations and then a Bayesian network conducts knowledge learning and inference based on the abstract data set, which will yield the coarse policy with corresponding confidence. Once the coarse policy’s confidence is low, another RL-based refine module will further optimize and fine-tune the policy to form a (near) optimal hybrid policy. Experimental results show that the proposed RLBNK method improves the learning efficiency of corresponding baseline RL algorithms under both normal and sparse reward settings. Furthermore, we demonstrate that our RLBNK method delivers better generalization capability and robustness than baseline methods.

Funder

National Natural Science Foundation of China

Publisher

Hindawi Limited

Subject

General Mathematics,General Medicine,General Neuroscience,General Computer Science

Reference50 articles.

1. Human-level control through deep reinforcement learning

2. A multi-layer and multi-ensemble stock trader using deep learning and deep reinforcement learning

3. Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning

4. Learning from demonstration;S. Schaal

5. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards;M. Vecerík,2017

Cited by 8 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Bayesian Strategy Networks Based Soft Actor-Critic Learning;ACM Transactions on Intelligent Systems and Technology;2024-03-29

2. Aligning Human and Robot Representations;Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction;2024-03-11

3. Fingerprint Networked Reinforcement Learning via Multiagent Modeling for Improving Decision Making in an Urban Food–Energy–Water Nexus;IEEE Transactions on Systems, Man, and Cybernetics: Systems;2023-07

4. Intelligent techniques in e-learning: a literature review;Artificial Intelligence Review;2023-06-14

5. A Method for High-Value Driving Demonstration Data Generation Based on One-Dimensional Deep Convolutional Generative Adversarial Networks;Electronics;2022-10-31

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3