Affiliation:
1. Division of Cardiovascular Medicine Department of Internal Medicine University of Utah School of Medicine Salt Lake City UT
2. Division of Epidemiology Department of Internal Medicine University of Utah School of Medicine Salt Lake City UT
3. Department of Population Health University of Utah School of Medicine Salt Lake City UT
4. Section of Cardiology Department of Internal Medicine University of Chicago Pritzker School of Medicine Chicago IL
5. Department of Preventive Medicine Northwestern University Feinberg School of Medicine Chicago IL
6. Centre for Digital Transformation of Health Victorian Comprehensive Cancer Centre Melbourne Australia
Abstract
Background
Electronic medical records (
EMR
s) allow identification of disease‐specific patient populations, but varying electronic cohort definitions could result in different populations. We compared the characteristics of an
electronic medical record
–derived atrial fibrillation (
AF
) patient population using 5 different electronic cohort definitions.
Methods and Results
Adult patients with at least 1
AF
billing code from January 1, 2010, to December 31, 2017, were included. Based on different electronic cohort definitions, we trained 5 different logistic regression models using a labeled training data set (n=786). Each model yielded a predicted probability; patients were classified as having
AF
if the probability was higher than a specified cut point. Test characteristics were calculated for each model. These models were then applied to the full cohort and resulting characteristics were compared. In the training set, the comprehensive model (including demographics, billing codes, and natural language processing results) performed best, with an area under the curve of 0.89, sensitivity of 0.90, and specificity of 0.87. Among a candidate population (n=22 000), the proportion of patients identified as having
AF
varied from 61% in the model using diagnosis or procedure
International Classification of Diseases
(
ICD
) billing codes to 83% in the model using natural language processing of clinical notes. Among identified
AF
patients, the proportion of patients with a
CHA
2
DS
2
‐
VAS
c score ≥2 varied from 69% to 85%;
oral anticoagulant
treatment rates varied from 50% to 66% depending on the model.
Conclusions
Different electronic cohort definitions result in substantially different
AF
study samples. This difference threatens the quality and reproducibility of electronic medical record–based research and quality initiatives.
Publisher
Ovid Technologies (Wolters Kluwer Health)
Subject
Cardiology and Cardiovascular Medicine
Cited by
19 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献