No compromise in solution quality: Speeding up belief-dependent continuous partially observable Markov decision processes via adaptive multilevel simplification-Reference-Cited by-同舟云学术

No compromise in solution quality: Speeding up belief-dependent continuous partially observable Markov decision processes via adaptive multilevel simplification

Published:2024-09-06 Issue: Volume: Page:
ISSN:0278-3649
Container-title:The International Journal of Robotics Research
language:en
Short-container-title:The International Journal of Robotics Research

Author:

Zhitnikov Andrey¹^ORCID,Sztyglic Ori²,Indelman Vadim³

Affiliation:

1. Technion Autonomous Systems Program (TASP), Technion - Israel Institute of Technology, Haifa, Israel

2. Department of Computer Science, Technion - Israel Institute of Technology, Haifa, Israel

3. Department of Aerospace Engineering, Technion - Israel Institute of Technology, Haifa, Israel

Abstract

Continuous Partially Observable Markov Decision Processes (POMDPs) with general belief-dependent rewards are notoriously difficult to solve online. In this paper, we present a complete provable theory of adaptive multilevel simplification for the setting of a given externally constructed belief tree and Monte Carlo Tree Search (MCTS) that constructs the belief tree on the fly using an exploration technique. Our theory allows to accelerate POMDP planning with belief-dependent rewards without any sacrifice in the quality of the obtained solution. We rigorously prove each theoretical claim in the proposed unified theory. Using the general theoretical results, we present three algorithms to accelerate continuous POMDP online planning with belief-dependent rewards. Our two algorithms, SITH-BSP and LAZY-SITH-BSP, can be utilized on top of any method that constructs a belief tree externally. The third algorithm, SITH-PFT, is an anytime MCTS method that permits to plug-in any exploration technique. All our methods are guaranteed to return exactly the same optimal action as their unsimplified equivalents. We replace the costly computation of information-theoretic rewards with novel adaptive upper and lower bounds which we derive in this paper, and are of independent interest. We show that they are easy to calculate and can be tightened by the demand of our algorithms. Our approach is general; namely, any bounds that monotonically converge to the reward can be utilized to achieve a significant speedup without any loss in performance. Our theory and algorithms support the challenging setting of continuous states, actions, and observations. The beliefs can be parametric or general and represented by weighted particles. We demonstrate in simulation a significant speedup in planning compared to baseline approaches with guaranteed identical performance.

Funder

Israel Science Foundation

Zuckerman Fund to the Technion Artificial Intelligence Hub

Publisher

SAGE Publications

Link

https://journals.sagepub.com/doi/pdf/10.1177/02783649241261398

Reference48 articles.

1. Barenboim M, Indelman V (2022) Adaptive information belief space planning. In: The 31st international joint conference on artificial intelligence and the 25th European conference on artificial intelligence (IJCAI-ECAI), Vienna, Austria, 23–29 July 2022.