Data Mining Approach to Identify Disease Cohorts from Primary Care Electronic Medical Records: A Case of Diabetes Mellitus-Reference-Cited by-同舟云学术

Data Mining Approach to Identify Disease Cohorts from Primary Care Electronic Medical Records: A Case of Diabetes Mellitus

Published:2017-12-12 Issue:1 Volume:10 Page:16-27
ISSN:1875-0362
Container-title:The Open Bioinformatics Journal
language:en
Short-container-title:TOBIOIJ

Author:

Owusu Adjah Ebenezer S.,Montvida Olga,Agbeve Julius,Paul Sanjoy K.

Abstract

Background:Identification of diseased patients from primary care based electronic medical records (EMRs) has methodological challenges that may impact epidemiologic inferences.Objective:To compare deterministic clinically guided selection algorithms with probabilistic machine learning (ML) methodologies for their ability to identify patients with type 2 diabetes mellitus (T2DM) from large population based EMRs from nationally representative primary care database.Methods:Four cohorts of patients with T2DM were defined by deterministic approach based on disease codes. The database was mined for a set of best predictors of T2DM and the performance of six ML algorithms were compared based on cross-validated true positive rate, true negative rate, and area under receiver operating characteristic curve.Results:In the database of 11,018,025 research suitable individuals, 379 657 (3.4%) were coded to have T2DM. Logistic Regression classifier was selected as best ML algorithm and resulted in a cohort of 383,330 patients with potential T2DM. Eighty-three percent (83%) of this cohort had a T2DM code, and 16% of the patients with T2DM code were not included in this ML cohort. Of those in the ML cohort without disease code, 52% had at least one measure of elevated glucose level and 22% had received at least one prescription for antidiabetic medication.Conclusion:Deterministic cohort selection based on disease coding potentially introduces significant mis-classification problem. ML techniques allow testing for potential disease predictors, and under meaningful data input, are able to identify diseased cohorts in a holistic way.

Publisher

Bentham Science Publishers Ltd.

Subject

Health Informatics,Biomedical Engineering,Computer Science (miscellaneous)

Link

https://openbioinformaticsjournal.com/contents/volumes/V10/TOBIOIJ-10-16/TOBIOIJ-10-16.pdf

Reference41 articles.

1. Sagreiya H, Altman RB. The utility of general purpose versus specialty clinical databases for research: Warfarin dose estimation from extracted clinical variables. J Biomed Inform 2010; 43 (5) : 747-51.

2. Shivade C, Raghavan P, Fosler-Lussier E, et al. A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc 2014; 21 (2) : 221-30.

3. Tate AR, Beloff N, Al-Radwan B, et al. Exploiting the potential of large databases of electronic health records for research using rapid search algorithms and an intuitive query interface. J Am Med Inform Assoc 2014; 21 (2) : 292-8.

4. Kandula S, Zeng-Treitler Q, Chen L, Salomon WL, Bray BE. A bootstrapping algorithm to improve cohort identification using structured data. J Biomed Inform 2011; 44 (Suppl. 1) : S63-8.

5. Sadek AR, Van Vlymen J, Khunti K, De Lusignan S. Automated identification of miscoded and misclassified cases of diabetes from computer records. Diabet Med 2012; 29 (3) : 410-4.

Cited by 21 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Racial disparity in the co-occurrence of depression and type 2 diabetes mellitus. An electronic medical record study involving African American and White Caucasian adults from the US;Journal of Affective Disorders;2023-06

2. Robustness of Multiple Imputation Methods for Missing Risk Factor Data from Electronic Medical Records for Observational Studies;Journal of Healthcare Informatics Research;2022-09-10

3. Temporal trends in the prevalence and incidence of depression and the interplay of comorbidities in patients with young- and usual-onset type 2 diabetes from the USA and the UK;Diabetologia;2022-09-05

4. Cardiorenal Complications in Young-Onset Type 2 Diabetes Compared Between White Americans and African Americans;Diabetes Care;2022-07-26

5. Application of machine learning methods for the prediction of true fasting status in patients performing blood tests;Scientific Reports;2022-07-13