Introspective Q-learning and learning from demonstration
-
Published:2019
Issue:
Volume:34
Page:
-
ISSN:0269-8889
-
Container-title:The Knowledge Engineering Review
-
language:en
-
Short-container-title:The Knowledge Engineering Review
Author:
Li Mao,Brys Tim,Kudenko Daniel
Abstract
Abstract
One challenge faced by reinforcement learning (RL) agents is that in many environments the reward signal is sparse, leading to slow improvement of the agent’s performance in early learning episodes. Potential-based reward shaping can help to resolve the aforementioned issue of sparse reward by incorporating an expert’s domain knowledge into the learning through a potential function. Past work on reinforcement learning from demonstration (RLfD) directly mapped (sub-optimal) human expert demonstration to a potential function, which can speed up RL. In this paper we propose an introspective RL agent that significantly further speeds up the learning. An introspective RL agent records its state–action decisions and experience during learning in a priority queue. Good quality decisions, according to a Monte Carlo estimation, will be kept in the queue, while poorer decisions will be rejected. The queue is then used as demonstration to speed up RL via reward shaping. A human expert’s demonstration can be used to initialize the priority queue before the learning process starts. Experimental validation in the 4-dimensional CartPole domain and the 27-dimensional Super Mario AI domain shows that our approach significantly outperforms non-introspective RL and state-of-the-art approaches in RLfD in both domains.
Publisher
Cambridge University Press (CUP)
Subject
Artificial Intelligence,Software
Reference24 articles.
1. Wiewiora, E. , Cottrell, G. & Elkan, C. 2003. Principled methods for advising reinforcement learning agents. In ICML. 792–799.
2. Policy invariance under reward transformations: theory and application to reward shaping;Ng;Proceedings of the Sixteenth International Conference on Machine Learning,1999
3. The Mario AI Benchmark and Competitions
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献