Markov Decision Processes with Observation Costs: Framework and Computation with a Penalty Scheme-Reference-Cited by-同舟云学术

Markov Decision Processes with Observation Costs: Framework and Computation with a Penalty Scheme

Published:2024-05-23 Issue: Volume: Page:
ISSN:0364-765X
Container-title:Mathematics of Operations Research
language:en
Short-container-title:Mathematics of OR

Author:

Reisinger Christoph¹^ORCID,Tam Jonathan¹^ORCID

Affiliation:

1. Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom

Abstract

We consider Markov decision processes where the state of the chain is only given at chosen observation times and of a cost. Optimal strategies involve the optimization of observation times as well as the subsequent action values. We consider the finite horizon and discounted infinite horizon problems as well as an extension with parameter uncertainty. By including the time elapsed from observations as part of the augmented Markov system, the value function satisfies a system of quasivariational inequalities (QVIs). Such a class of QVIs can be seen as an extension to the interconnected obstacle problem. We prove a comparison principle for this class of QVIs, which implies the uniqueness of solutions to our proposed problem. Penalty methods are then utilized to obtain arbitrarily accurate solutions. Finally, we perform numerical experiments on three applications that illustrate our framework. Funding: J. Tam is supported by the Engineering and Physical Sciences Research Council [Grant 2269738].

Publisher

Institute for Operations Research and the Management Sciences (INFORMS)

Link

https://pubsonline.informs.org/doi/pdf/10.1287/moor.2023.0172

Reference36 articles.

1. Optimal Inspections in a Stochastic Control Problem with Costly Observations

2. Optimal Inspections in a Stochastic Control Problem with Costly Observations, II

3. Weakly Chained Matrices, Policy Iteration, and Impulse Control

4. Quickest Detection with Discretely Controlled Observations

5. Disorder detection with costly observations