Hierarchical reinforcement learning for handling sparse rewards in multi-goal navigation

Author:

Yan Jiangyue,Luo Biao,Xu Xiaodong

Abstract

AbstractReinforcement learning (RL) has achieved remarkable advancements in navigation tasks in recent years. However, tackling multi-goal navigation tasks with sparse rewards remains a complex and challenging problem due to the long-sequence decision-making involved. Such multi-goal navigation tasks inherently incorporate a hybrid action space, where the robot needs to select a navigation endpoint first before executing primitive actions. To address the problem of multi-goal navigation with sparse rewards, we introduce a novel hierarchical RL framework named Hierarchical RL with Multi-Goal (HRL-MG). The main idea of HRL-MG is to divide and conquer the hybrid action space, splitting long-sequence decisions into short-sequence decisions. The HRL-MG framework is composed of two main modules: a selector and an actuator. The selector employs a temporal abstraction hierarchical architecture designed to specify a desired end goal based on the discrete action space. Conversely, the actuator utilizes a continuous goal-oriented hierarchical architecture developed to enact continuous action sequences to reach the desired end goal specified by the selector. In addition, we incorporate a dynamic goal detection mechanism, grounded in hindsight experience replay, to mitigate the challenges posed by sparse reward landscapes. We validated the algorithm’s efficacy on both the discrete environment Maze_2D and the continuous robotic environment MuJoCo ‘Ant’. The results indicate that HRL-MG significantly outperforms other methods in multi-goal navigation tasks with sparse rewards.

Funder

National Natural Science Foundation of China

Publisher

Springer Science and Business Media LLC

Reference50 articles.

1. Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the 21st international conference on machine learning, p1

2. Andrychowicz M, Wolski F, Ray A etal (2017) Hindsight experience replay. In: Proceedings of the 31st international conference on neural information processing systems, pp 5055–5065

3. Bacon PL, Harb J, Precup D (2017) The option-critic architecture. In: Proceedings of the association advancement artificial intelligence, pp 1726–1734

4. Barto AG, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discret Event Dyn Syst 13(1–2):41–77

5. Bellemare M, Srinivasan S, Ostrovski G et al (2016) Unifying count-based exploration and intrinsic motivation. In: Advances in neural information processing systems 29 (NIPS 2016), pp. 1471–1479

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3