Faster MIL-based Subgoal Identification for Reinforcement Learning by Tuning Fewer Hyperparameters-Reference-Cited by-同舟云学术

Faster MIL-based Subgoal Identification for Reinforcement Learning by Tuning Fewer Hyperparameters

Published:2024-04-20 Issue:2 Volume:19 Page:1-29
ISSN:1556-4665
Container-title:ACM Transactions on Autonomous and Adaptive Systems
language:en
Short-container-title:ACM Trans. Auton. Adapt. Syst.

Author:

Sunel Saim¹^ORCID,Çilden Erkin²^ORCID,Polat Faruk¹^ORCID

Affiliation:

1. Department of Computer Engineering, Middle East Technical University, Ankara, Turkey

2. RF and Simulation Systems Directorate, STM Defense Technologies Engineering and Trade Inc., Ankara, Turkey

Abstract

Various methods have been proposed in the literature for identifying subgoals in discrete reinforcement learning (RL) tasks. Once subgoals are discovered, task decomposition methods can be employed to improve the learning performance of agents. In this study, we classify prominent subgoal identification methods for discrete RL tasks in the literature into the following three categories: graph-based, statistics-based, and multi-instance learning (MIL)-based. As contributions, first, we introduce a new MIL-based subgoal identification algorithm called EMDD-RL and experimentally compare it with a previous MIL-based method. The previous approach adapts MIL’s Diverse Density (DD) algorithm, whereas our method considers Expected-Maximization Diverse Density (EMDD). The advantage of EMDD over DD is that it can yield more accurate results with less computation demand thanks to the expectation-maximization algorithm. EMDD-RL modifies some of the algorithmic steps of EMDD to identify subgoals in discrete RL problems. Second, we evaluate the methods in several RL tasks for the hyperparameter tuning overhead they incur. Third, we propose a new RL problem called key-room and compare the methods for their subgoal identification performances in this new task. Experiment results show that MIL-based subgoal identification methods could be preferred to the algorithms of the other two categories in practice.

Funder

National Scholarship Programme for MSc students of the Scientific and Technological Research Council of Turkey

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3643852

Reference29 articles.

1. Using chains of bottleneck transitions to decompose and solve reinforcement learning tasks with hidden states

2. Akhil Bagaria and George Konidaris. 2019. Option discovery using deep skill chaining. In Proceedings of the International Conference on Learning Representations.

3. Using relative novelty to identify useful temporal abstractions in reinforcement learning

4. Identifying useful subgoals in reinforcement learning by local graph partitioning

5. Deriving Subgoals Autonomously to Accelerate Learning in Sparse Reward Domains