Implicit bias in Critical Care Data: Factors affecting sampling frequencies and missingness patterns of clinical and biological variables in ICU Patients-Reference-Cited by-同舟云学术

Implicit bias in Critical Care Data: Factors affecting sampling frequencies and missingness patterns of clinical and biological variables in ICU Patients

Published:2024-06-10 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Shi Junming (Seraphina)^ORCID,Hubbard Alan E.^ORCID,Fong Nicholas^ORCID,Pirracchio Romain^ORCID

Abstract

AbstractThe presence of missing values in Electronic Health Records (EHRs) is a widespread and inescapable issue. Publicly available data sets mirror the incompleteness found in EHRs. Although the existing literature largely approaches missing data as a random phenomenon, the mechanisms behind these missing values are often not random with respect to important characteristics of the patients. Similarly, the sampling frequency of clinical or biological parameters is likely informative. The possible informative nature of patterns in missing data is often overlooked. For both missingness and sampling frequency, we hypothesize that the underlying mechanism may be at least consistent with implicit bias.To investigate this important issue, we introduce a novel analytical framework designed to rigorously examine missing data and sampling frequency in EHRs. We utilize the MIMIC-III dataset as a case study, given its frequent use in training machine learning models for healthcare applications. Our approach incorporates Targeted Machine Learning (TML) to study the impact of a series of demographic variables, including protected attributes such as age, sex, race, and ethnicity on the rate of missing data and sampling frequency for key clinical and biological variables in critical care settings. Our results expose underlying differences in the sampling frequency and missing data patterns of vital sign measurements and laboratory tests between different demographic groups. In addition, we find that these measurement patterns can provide significant predictive insights into patient outcomes. Consequently, we urge a reevaluation of the conventional understanding of missing data and sampling frequencies in EHRs. Acknowledging and addressing these biases is essential for advancing equitable and accurate healthcare through machine learning applications.

Publisher

Cold Spring Harbor Laboratory

Reference35 articles.

1. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments

2. Launching PCORnet, a national patient-centered clinical research network

3. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text

4. A review of approaches to identifying patient phenotype cohorts using electronic health records

5. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review