A self-learning Monte Carlo tree search algorithm for robot path planning-Reference-Cited by-同舟云学术

A self-learning Monte Carlo tree search algorithm for robot path planning

Published:2023-07-06 Issue: Volume:17 Page:
ISSN:1662-5218
Container-title:Frontiers in Neurorobotics
language:
Short-container-title:Front. Neurorobot.

Author:

Li Wei,Liu Yi,Ma Yan,Xu Kang,Qiu Jiang,Gan Zhongxue

Abstract

This paper proposes a self-learning Monte Carlo tree search algorithm (SL-MCTS), which has the ability to continuously improve its problem-solving ability in single-player scenarios. SL-MCTS combines the MCTS algorithm with a two-branch neural network (PV-Network). The MCTS architecture can balance the search for exploration and exploitation. PV-Network replaces the rollout process of MCTS and predicts the promising search direction and the value of nodes, which increases the MCTS convergence speed and search efficiency. The paper proposes an effective method to assess the trajectory of the current model during the self-learning process by comparing the performance of the current model with that of its best-performing historical model. Additionally, this method can encourage SL-MCTS to generate optimal solutions during the self-learning process. We evaluate the performance of SL-MCTS on the robot path planning scenario. The experimental results show that the performance of SL-MCTS is far superior to the traditional MCTS and single-player MCTS algorithms in terms of path quality and time consumption, especially its time consumption is half less than that of the traditional MCTS algorithms. SL-MCTS also performs comparably to other iterative-based search algorithms designed specifically for path planning tasks.

Funder

Ji Hua Laboratory

Science and Technology Commission of Shanghai Municipality

Publisher

Frontiers Media SA

Subject

Artificial Intelligence,Biomedical Engineering

Reference33 articles.

1. Path planning techniques for unmanned aerial vehicles: a review, solutions, and challenges;Aggarwal;Comput. Commun,2020

2. Finite-time analysis of the multiarmed bandit problem;Auer;Mach. Learn,2002

3. An improved pso-gwo algorithm with chaos and adaptive inertial weight for robot path planning;Cheng;Front. Neurorobot,2021

4. “Whole-history rating: a Bayesian rating system for players of time-varying strength,”;Coulom,2008

5. An analysis of single-player Monte Carlo tree search performance in sokoban;Crippa;Expert Syst. Appl,2022

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An adaptive control framework based multi-modal information-driven dance composition model for musical robots;Frontiers in Neurorobotics;2023-10-09