Prioritized experience replay in path planning via multi-dimensional transition priority fusion

Author:

Cheng Nuo,Wang Peng,Zhang Guangyuan,Ni Cui,Nematov Erkin

Abstract

IntroductionDeep deterministic policy gradient (DDPG)-based path planning algorithms for intelligent robots struggle to discern the value of experience transitions during training due to their reliance on a random experience replay. This can lead to inappropriate sampling of experience transitions and overemphasis on edge experience transitions. As a result, the algorithm's convergence becomes slower, and the success rate of path planning diminishes.MethodsWe comprehensively examines the impacts of immediate reward, temporal-difference error (TD-error), and Actor network loss function on the training process. It calculates experience transition priorities based on these three factors. Subsequently, using information entropy as a weight, the three calculated priorities are merged to determine the final priority of the experience transition. In addition, we introduce a method for adaptively adjusting the priority of positive experience transitions to focus on positive experience transitions and maintain a balanced distribution. Finally, the sampling probability of each experience transition is derived from its respective priority.ResultsThe experimental results showed that the test time of our method is shorter than that of PER algorithm, and the number of collisions with obstacles is less. It indicated that the determined experience transition priority accurately gauges the significance of distinct experience transitions for path planning algorithm training.DiscussionThis method enhances the utilization rate of transition conversion and the convergence speed of the algorithm and also improves the success rate of path planning.

Publisher

Frontiers Media SA

Subject

Artificial Intelligence,Biomedical Engineering

Reference31 articles.

1. “High-value prioritized experience replay for off-policy reinforcement learning,”;Cao,2019

2. An adaptive clustering-based algorithm for automatic path planning of heterogeneous UAVs;Chen;IEEE Trans. Intell. Transp. Syst,2021

3. Mapless collaborative navigation for a multi-robot system based on the deep reinforcement learning;Chen;Appl. Sci,2019

4. “Off-policy correction for deep deterministic policy gradient algorithms via batch prioritized experience replay,”;Cicek,2021

5. “Mobile robot path planning based on improved DDPG reinforcement learning algorithm,”;Dong,2020

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3