Attention-based advantage actor-critic algorithm with prioritized experience replay for complex 2-D robotic motion planning-Reference-Cited by-同舟云学术

Attention-based advantage actor-critic algorithm with prioritized experience replay for complex 2-D robotic motion planning

Published:2022-08-07 Issue:1 Volume:34 Page:151-180
ISSN:0956-5515
Container-title:Journal of Intelligent Manufacturing
language:en
Short-container-title:J Intell Manuf

Author:

Zhou Chengmin,Huang Bingding,Hassan Haseeb,Fränti Pasi^ORCID

Abstract

AbstractRobotic motion planning in dense and dynamic indoor scenarios constantly challenges the researchers because of the motion unpredictability of obstacles. Recent progress in reinforcement learning enables robots to better cope with the dense and unpredictable obstacles by encoding complex features of the robot and obstacles into the encoders like the long-short term memory (LSTM). Then these features are learned by the robot using reinforcement learning algorithms, such as the deep Q network and asynchronous advantage actor critic algorithm. However, existing methods depend heavily on expert experiences to enhance the convergence speed of the networks by initializing them via imitation learning. Moreover, those approaches based on LSTM to encode the obstacle features are not always efficient and robust enough, therefore sometimes causing the network overfitting in training. This paper focuses on the advantage actor critic algorithm and introduces an attention-based actor critic algorithm with experience replay algorithm to improve the performance of existing algorithm from two perspectives. First, LSTM encoder is replaced by a robust encoder attention weight to better interpret the complex features of the robot and obstacles. Second, the robot learns from its past prioritized experiences to initialize the networks of the advantage actor-critic algorithm. This is achieved by applying the prioritized experience replay method, which makes the best of past useful experiences to improve the convergence speed. As results, the network based on our algorithm takes only around 15% and 30% experiences to get rid of the early-stage training without the expert experiences in cases with five and ten obstacles, respectively. Then it converges faster to a better reward with less experiences (near 45% and 65% of experiences in cases with ten and five obstacles respectively) when comparing with the baseline LSTM-based advantage actor critic algorithm. Our source code is freely available at the GitHub (https://github.com/CHUENGMINCHOU/AW-PER-A2C).

Funder

University of Eastern Finland (UEF) including Kuopio University Hospital

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Industrial and Manufacturing Engineering,Software

Link

https://link.springer.com/content/pdf/10.1007/s10845-022-01988-z.pdf

Reference48 articles.

1. Bai, Z., Cai, B., Shangguan, W., & Chai, L. (2019). Deep learning based motion planning for autonomous vehicle using spatiotemporal LSTM network. In Proceedings 2018 Chinese Automation Congress, CAC 2018 (pp. 1610–1614). https://doi.org/10.1109/CAC.2018.8623233

2. Baird, L. (1995). Residual algorithms: Reinforcement learning with function approximation. Machine Learning Proceedings, 1995, 30–37. https://doi.org/10.1016/b978-1-55860-377-6.50013-x

3. Barton, M., Shragai, N., & Elber, G. (2009). Kinematic simulation of planar and spatial mechanisms using a polynomial constraints solver. Computer-Aided Design and Applications, 6(1), 115–123. https://doi.org/10.3722/cadaps.2009.115-123

4. Bas, E. (2019). An introduction to Markov chains. Basics of Probability and Stochastic Processes. https://doi.org/10.1007/978-3-030-32323-3_12

5. Brownlee, J. (2018). Better deep learning: train faster, reduce overfitting, and make better predictions. Machine Learning Mastery. Retrieved from https://machinelearningmastery.com/better-deep-learning/.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Robotic Manipulator in Dynamic Environment with SAC Combing Attention Mechanism and LSTM;Electronics;2024-05-17

2. Memory-based soft actor–critic with prioritized experience replay for autonomous navigation;Intelligent Service Robotics;2024-02-29

3. Artificial Intelligence Algorithms in Flood Prediction: A General Overview;Geo-information for Disaster Monitoring and Management;2024

4. A mixed perception-based human-robot collaborative maintenance approach driven by augmented reality and online deep reinforcement learning;Robotics and Computer-Integrated Manufacturing;2023-10