Some performance considerations when using multi-armed bandit algorithms in the presence of missing data-Reference-Cited by-同舟云学术

Some performance considerations when using multi-armed bandit algorithms in the presence of missing data

Published:2022-09-12 Issue:9 Volume:17 Page:e0274272
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Chen Xijin^ORCID,Lee Kim May,Villar Sofia S.,Robertson David S.

Abstract

When comparing the performance of multi-armed bandit algorithms, the potential impact of missing data is often overlooked. In practice, it also affects their implementation where the simplest approach to overcome this is to continue to sample according to the original bandit algorithm, ignoring missing outcomes. We investigate the impact on performance of this approach to deal with missing data for several bandit algorithms through an extensive simulation study assuming the rewards are missing at random. We focus on two-armed bandit algorithms with binary outcomes in the context of patient allocation for clinical trials with relatively small sample sizes. However, our results apply to other applications of bandit algorithms where missing data is expected to occur. We assess the resulting operating characteristics, including the expected reward. Different probabilities of missingness in both arms are considered. The key finding of our work is that when using the simplest strategy of ignoring missing data, the impact on the expected performance of multi-armed bandit strategies varies according to the way these strategies balance the exploration-exploitation trade-off. Algorithms that are geared towards exploration continue to assign samples to the arm with more missing responses (which being perceived as the arm with less observed information is deemed more appealing by the algorithm than it would otherwise be). In contrast, algorithms that are geared towards exploitation would rapidly assign a high value to samples from the arms with a current high mean irrespective of the level observations per arm. Furthermore, for algorithms focusing more on exploration, we illustrate that the problem of missing responses can be alleviated using a simple mean imputation approach.

Funder

NIHR Cambridge Biomedical Research Centre

NIHR Maudsley Biomedical Research Centre

Medical Research Council

National Institute for Health Research

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference77 articles.

1. Probabilistic machine learning for healthcare;IY Chen;Annual Review of Biomedical Data Science,2020

2. Online decision making with high-dimensional covariates;H Bastani;Operations Research,2020

3. Clinician checklist for assessing suitability of machine learning applications in healthcare;I Scott;BMJ Health & Care Informatics,2021

4. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples;WR Thompson;Biometrika,1933

5. Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges;SS Villar;Statistical science: a review journal of the Institute of Mathematical Statistics,2015

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Response-Adaptive Randomization in Clinical Trials: From Myths to Practical Considerations;Statistical Science;2023-05-01