Introspective Q-learning and learning from demonstration-Reference-Cited by-同舟云学术

Introspective Q-learning and learning from demonstration

Published:2019 Issue: Volume:34 Page:
ISSN:0269-8889
Container-title:The Knowledge Engineering Review
language:en
Short-container-title:The Knowledge Engineering Review

Author:

Li Mao,Brys Tim,Kudenko Daniel

Abstract

Abstract One challenge faced by reinforcement learning (RL) agents is that in many environments the reward signal is sparse, leading to slow improvement of the agent’s performance in early learning episodes. Potential-based reward shaping can help to resolve the aforementioned issue of sparse reward by incorporating an expert’s domain knowledge into the learning through a potential function. Past work on reinforcement learning from demonstration (RLfD) directly mapped (sub-optimal) human expert demonstration to a potential function, which can speed up RL. In this paper we propose an introspective RL agent that significantly further speeds up the learning. An introspective RL agent records its state–action decisions and experience during learning in a priority queue. Good quality decisions, according to a Monte Carlo estimation, will be kept in the queue, while poorer decisions will be rejected. The queue is then used as demonstration to speed up RL via reward shaping. A human expert’s demonstration can be used to initialize the priority queue before the learning process starts. Experimental validation in the 4-dimensional CartPole domain and the 27-dimensional Super Mario AI domain shows that our approach significantly outperforms non-introspective RL and state-of-the-art approaches in RLfD in both domains.

Publisher

Cambridge University Press (CUP)

Subject

Artificial Intelligence,Software

Reference24 articles.

1. Wiewiora, E. , Cottrell, G. & Elkan, C. 2003. Principled methods for advising reinforcement learning agents. In ICML. 792–799.

2. Policy invariance under reward transformations: theory and application to reward shaping;Ng;Proceedings of the Sixteenth International Conference on Machine Learning,1999

3. The Mario AI Benchmark and Competitions

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Model-free reinforcement learning from expert demonstrations: a survey;Artificial Intelligence Review;2021-10-18

2. Special issue on adaptive and learning agents 2018;The Knowledge Engineering Review;2021