Automatic landmark discovery for learning agents under partial observability-Reference-Cited by-同舟云学术

Automatic landmark discovery for learning agents under partial observability

Published:2019 Issue: Volume:34 Page:
ISSN:0269-8889
Container-title:The Knowledge Engineering Review
language:en
Short-container-title:The Knowledge Engineering Review

Author:

Demіr Alper^ORCID,Çіlden Erkіn^ORCID,Polat Faruk

Abstract

Abstract In the reinforcement learning context, a landmark is a compact information which uniquely couples a state, for problems with hidden states. Landmarks are shown to support finding good memoryless policies for Partially Observable Markov Decision Processes (POMDP) which contain at least one landmark. SarsaLandmark, as an adaptation of Sarsa(λ), is known to promise a better learning performance with the assumption that all landmarks of the problem are known in advance. In this paper, we propose a framework built upon SarsaLandmark, which is able to automatically identify landmarks within the problem during learning without sacrificing quality, and requiring no prior information about the problem structure. For this purpose, the framework fuses SarsaLandmark with a well-known multiple-instance learning algorithm, namely Diverse Density (DD). By further experimentation, we also provide a deeper insight into our concept filtering heuristic to accelerate DD, abbreviated as DDCF (Diverse Density with Concept Filtering), which proves itself to be suitable for POMDPs with landmarks. DDCF outperforms its antecedent in terms of computation speed and solution quality without loss of generality. The methods are empirically shown to be effective via extensive experimentation on a number of known and newly introduced problems with hidden state, and the results are discussed.

Publisher

Cambridge University Press (CUP)

Subject

Artificial Intelligence,Software

Reference40 articles.

1. Discovering global network communities based on local centralities

2. Wikipedia 2018. Landmark. https://en.wikipedia.org/wiki/Landmark (visited on 22 January 2018).

3. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning

4. Learning Options in Reinforcement Learning

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Potential-based reward shaping using state–space segmentation for efficiency in reinforcement learning;Future Generation Computer Systems;2024-08

2. Population-based exploration in reinforcement learning through repulsive reward shaping using eligibility traces;Annals of Operations Research;2024-01-18

3. Landmark based guidance for reinforcement learning agents under partial observability;International Journal of Machine Learning and Cybernetics;2022-11-16

4. Using chains of bottleneck transitions to decompose and solve reinforcement learning tasks with hidden states;Future Generation Computer Systems;2022-08