Ambiguous Dynamic Treatment Regimes: A Reinforcement Learning Approach-Reference-Cited by-同舟云学术

Ambiguous Dynamic Treatment Regimes: A Reinforcement Learning Approach

Published:2023-10-04 Issue: Volume: Page:
ISSN:0025-1909
Container-title:Management Science
language:en
Short-container-title:Management Science

Author:

Saghafian Soroush¹^ORCID

Affiliation:

1. Harvard Kennedy School, Harvard University, Cambridge, Massachusetts 02138

Abstract

A main research goal in various studies is to use an observational data set and provide a new set of counterfactual guidelines that can yield causal improvements. Dynamic Treatment Regimes (DTRs) are widely studied to formalize this process and enable researchers to find guidelines that are both personalized and dynamic. However, available methods in finding optimal DTRs often rely on assumptions that are violated in real-world applications (e.g., medical decision making or public policy), especially when (a) the existence of unobserved confounders cannot be ignored, and (b) the unobserved confounders are time varying (e.g., affected by previous actions). When such assumptions are violated, one often faces ambiguity regarding the underlying causal model that is needed to be assumed to obtain an optimal DTR. This ambiguity is inevitable because the dynamics of unobserved confounders and their causal impact on the observed part of the data cannot be understood from the observed data. Motivated by a case study of finding superior treatment regimes for patients who underwent transplantation in our partner hospital (Mayo Clinic) and faced a medical condition known as new-onset diabetes after transplantation, we extend DTRs to a new class termed Ambiguous Dynamic Treatment Regimes (ADTRs), in which the causal impact of treatment regimes is evaluated based on a “cloud” of potential causal models. We then connect ADTRs to Ambiguous Partially Observable Markov Decision Processes (APOMDPs) proposed by Saghafian (2018) , and consider unobserved confounders as latent variables but with ambiguous dynamics and causal effects on observed variables. Using this connection, we develop two reinforcement learning methods termed Direct Augmented V-Learning (DAV-Learning) and Safe Augmented V-Learning (SAV-Learning), which enable using the observed data to effectively learn an optimal treatment regime. We establish theoretical results for these learning methods, including (weak) consistency and asymptotic normality. We further evaluate the performance of these learning methods both in our case study (using clinical data) and in simulation experiments (using synthetic data). We find promising results for our proposed approaches, showing that they perform well even compared with an imaginary oracle who knows both the true causal model (of the data-generating process) and the optimal regime under that model. Finally, we highlight that our approach enables a two-way personalization; obtained treatment regimes can be personalized based on both patients’ characteristics and physicians’ preferences.This paper was accepted by David Simchi-Levi, data science.Supplemental Material: The data files and online appendix are available at https://doi.org/10.1287/mnsc.2022.00883 .

Publisher

Institute for Operations Research and the Management Sciences (INFORMS)

Subject

Management Science and Operations Research,Strategy and Management

Link

https://pubsonline.informs.org/doi/pdf/10.1287/mnsc.2022.00883

Reference67 articles.

1. Standards of Medical Care in Diabetes—2012

2. Estimating ambiguity aversion in a portfolio choice experiment

3. Measurements, Regression, and Calibration.

4. Alternative Approaches to the Theory of Choice in Risk-Taking Situations

5. Appendix: An optimality criterion for decision-making under ignorance

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhancing Medical Training Through Learning From Mistakes by Interacting With an Ill-Trained Reinforcement Learning Agent;IEEE Transactions on Learning Technologies;2024