Abstract
Reinforcement Learning (RL) provides effective results with an agent learning from a stand-alone reward function. However, it presents unique challenges with large amounts of environment states and action spaces, as well as in the determination of rewards. Imitation Learning (IL) offers a promising solution for those challenges using a teacher. In IL, the learning process can take advantage of human-sourced assistance and/or control over the agent and environment. A human teacher and an agent learner are considered in this study. The teacher takes part in the agent’s training towards dealing with the environment, tackling a specific objective, and achieving a predefined goal. This paper proposes a novel approach combining IL with different types of RL methods, namely, state-action-reward-state-action (SARSA) and Asynchronous Advantage Actor–Critic Agents (A3C), to overcome the problems of both stand-alone systems. How to effectively leverage the teacher’s feedback—be it direct binary or indirect detailed—for the agent learner to learn sequential decision-making policies is addressed. The results of this study on various OpenAI-Gym environments show that this algorithmic method can be incorporated with different combinations, and significantly decreases both human endeavors and tedious exploration process.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference26 articles.
1. Deep reinforcement learning from human preferences;Christiano;arXiv,2017
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献