Author:
Yan Jiangyue,Luo Biao,Xu Xiaodong
Abstract
AbstractReinforcement learning (RL) has achieved remarkable advancements in navigation tasks in recent years. However, tackling multi-goal navigation tasks with sparse rewards remains a complex and challenging problem due to the long-sequence decision-making involved. Such multi-goal navigation tasks inherently incorporate a hybrid action space, where the robot needs to select a navigation endpoint first before executing primitive actions. To address the problem of multi-goal navigation with sparse rewards, we introduce a novel hierarchical RL framework named Hierarchical RL with Multi-Goal (HRL-MG). The main idea of HRL-MG is to divide and conquer the hybrid action space, splitting long-sequence decisions into short-sequence decisions. The HRL-MG framework is composed of two main modules: a selector and an actuator. The selector employs a temporal abstraction hierarchical architecture designed to specify a desired end goal based on the discrete action space. Conversely, the actuator utilizes a continuous goal-oriented hierarchical architecture developed to enact continuous action sequences to reach the desired end goal specified by the selector. In addition, we incorporate a dynamic goal detection mechanism, grounded in hindsight experience replay, to mitigate the challenges posed by sparse reward landscapes. We validated the algorithm’s efficacy on both the discrete environment Maze_2D and the continuous robotic environment MuJoCo ‘Ant’. The results indicate that HRL-MG significantly outperforms other methods in multi-goal navigation tasks with sparse rewards.
Funder
National Natural Science Foundation of China
Publisher
Springer Science and Business Media LLC
Reference50 articles.
1. Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the 21st international conference on machine learning, p1
2. Andrychowicz M, Wolski F, Ray A etal (2017) Hindsight experience replay. In: Proceedings of the 31st international conference on neural information processing systems, pp 5055–5065
3. Bacon PL, Harb J, Precup D (2017) The option-critic architecture. In: Proceedings of the association advancement artificial intelligence, pp 1726–1734
4. Barto AG, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discret Event Dyn Syst 13(1–2):41–77
5. Bellemare M, Srinivasan S, Ostrovski G et al (2016) Unifying count-based exploration and intrinsic motivation. In: Advances in neural information processing systems 29 (NIPS 2016), pp. 1471–1479