Characterizing and Managing Missing Structured Data in Electronic Health Records-Reference-Cited by-同舟云学术

Characterizing and Managing Missing Structured Data in Electronic Health Records

Published:2017-07-24 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Beaulieu-Jones Brett K.,Lavage Daniel R.,Snyder John W.,Moore Jason H.,Pendergrass Sarah A,Bauer Christopher R.

Abstract

ABSTRACTMissing data is a challenge for all studies; however, this is especially true for electronic health record (EHR) based analyses. Failure to appropriately consider missing data can lead to biased results. Here, we provide detailed procedures for when and how to conduct imputation of EHR data. We demonstrate how the mechanism of missingness can be assessed, evaluate the performance of a variety of imputation methods, and describe some of the most frequent problems that can be encountered. We analyzed clinical lab measures from 602,366 patients in the Geisinger Health System EHR. Using these data, we constructed a representative set of complete cases and assessed the performance of 12 different imputation methods for missing data that was simulated based on 4 mechanisms of missingness. Our results show that several methods including variations of Multivariate Imputation by Chained Equations (MICE) and softImpute consistently imputed missing values with low error; however, only a subset of the MICE methods were suitable for multiple imputation. The analyses described provide an outline of considerations for dealing with missing EHR data, steps that researchers can perform to characterize missingness within their own data, and an evaluation of methods that can be applied to impute clinical data. While the performance of methods may vary between datasets, the process we describe can be generalized to the majority of structured data types that exist in EHRs and all of our methods and code are publicly available.

Publisher

Cold Spring Harbor Laboratory

Reference22 articles.

1. Health Care and the American Recovery and Reinvestment Act

2. Flintoft L . Disease genetics: phenome-wide association studies go large. Nat Rev Genet 2014.

3. Strategies for Handling Missing Data in Electronic Health Record Derived Data

4. What is the difference between missing completely at random and missing at random?

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Imputing Missing Data in Electronic Health Records;Lecture Notes in Electrical Engineering;2022