Hierarchical reinforcement learning for handling sparse rewards in multi-goal navigation-Reference-Cited by-同舟云学术

Hierarchical reinforcement learning for handling sparse rewards in multi-goal navigation

Published:2024-05-28 Issue:6 Volume:57 Page:
ISSN:1573-7462
Container-title:Artificial Intelligence Review
language:en
Short-container-title:Artif Intell Rev

Author:

Yan Jiangyue,Luo Biao,Xu Xiaodong

Abstract

AbstractReinforcement learning (RL) has achieved remarkable advancements in navigation tasks in recent years. However, tackling multi-goal navigation tasks with sparse rewards remains a complex and challenging problem due to the long-sequence decision-making involved. Such multi-goal navigation tasks inherently incorporate a hybrid action space, where the robot needs to select a navigation endpoint first before executing primitive actions. To address the problem of multi-goal navigation with sparse rewards, we introduce a novel hierarchical RL framework named Hierarchical RL with Multi-Goal (HRL-MG). The main idea of HRL-MG is to divide and conquer the hybrid action space, splitting long-sequence decisions into short-sequence decisions. The HRL-MG framework is composed of two main modules: a selector and an actuator. The selector employs a temporal abstraction hierarchical architecture designed to specify a desired end goal based on the discrete action space. Conversely, the actuator utilizes a continuous goal-oriented hierarchical architecture developed to enact continuous action sequences to reach the desired end goal specified by the selector. In addition, we incorporate a dynamic goal detection mechanism, grounded in hindsight experience replay, to mitigate the challenges posed by sparse reward landscapes. We validated the algorithm’s efficacy on both the discrete environment Maze_2D and the continuous robotic environment MuJoCo ‘Ant’. The results indicate that HRL-MG significantly outperforms other methods in multi-goal navigation tasks with sparse rewards.

Funder

National Natural Science Foundation of China

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s10462-024-10794-3.pdf

Reference50 articles.

1. Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the 21st international conference on machine learning, p1

2. Andrychowicz M, Wolski F, Ray A etal (2017) Hindsight experience replay. In: Proceedings of the 31st international conference on neural information processing systems, pp 5055–5065

3. Bacon PL, Harb J, Precup D (2017) The option-critic architecture. In: Proceedings of the association advancement artificial intelligence, pp 1726–1734

4. Barto AG, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discret Event Dyn Syst 13(1–2):41–77

5. Bellemare M, Srinivasan S, Ostrovski G et al (2016) Unifying count-based exploration and intrinsic motivation. In: Advances in neural information processing systems 29 (NIPS 2016), pp. 1471–1479