A reinforcement learning application of a guided Monte Carlo Tree Search algorithm for beam orientation selection in radiation therapy-Reference-Cited by-同舟云学术

A reinforcement learning application of a guided Monte Carlo Tree Search algorithm for beam orientation selection in radiation therapy

Published:2021-05-13 Issue:3 Volume:2 Page:035013
ISSN:2632-2153
Container-title:Machine Learning: Science and Technology
language:
Short-container-title:Mach. Learn.: Sci. Technol.

Author:

Sadeghnejad-Barkousaraie Azar^ORCID,Bohara Gyanendra,Jiang Steve,Nguyen Dan^ORCID

Abstract

Abstract Current beam orientation optimization algorithms for radiotherapy, such as column generation (CG), are typically heuristic or greedy in nature because of the size of the combinatorial problem, which leads to suboptimal solutions. We propose a reinforcement learning strategy using a Monte Carlo Tree Search (MCTS) that can find a better beam orientation set in less time than CG. We utilize a reinforcement learning structure involving a supervised learning network to guide the MCTS and to explore the decision space of beam orientation selection problems. We previously trained a deep neural network (DNN) that takes in the patient anatomy, organ weights, and current beams, then approximates beam fitness values to indicate the next best beam to add. Here, we use this DNN to probabilistically guide the traversal of the branches of the Monte Carlo decision tree to add a new beam to the plan. To assess the feasibility of the algorithm, we used a test set of 13 prostate cancer patients, distinct from the 57 patients originally used to train and validate the DNN, to solve five-beam plans. To show the strength of the guided MCTS (GTS) compared to other search methods, we also provided the performances of Guided Search, Uniform Tree Search and Random Search algorithms. On average, GTS outperformed all the other methods. It found a better solution than CG in 237 s on average, compared to 360 s for CG, and outperformed all other methods in finding a solution with a lower objective function value in less than 1000 s. Using our GTS method, we could maintain planning target volume (PTV) coverage within 1% error similar to CG, while reducing the organ-at-risk mean dose for body, rectum, left and right femoral heads; the mean dose to bladder was 1% higher with GTS than with CG.

Funder

Foundation for the National Institutes of Health

Cancer Prevention and Research Institute of Texas

Publisher

IOP Publishing

Subject

Artificial Intelligence,Human-Computer Interaction,Software

Link

https://iopscience.iop.org/article/10.1088/2632-2153/abe528/pdf

Reference56 articles.

1. Cancer and radiation therapy: current advances and future directions;Baskar;Int. J. Med. Sci.,2012

2. Intensity-modulated radiotherapy—what is it?;Taylor;Cancer Imaging,2004