Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning-Reference-Cited by-同舟云学术

Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning

Published:2020-01-24 Issue:1 Volume:8 Page:e16042
ISSN:2291-9694
Container-title:JMIR Medical Informatics
language:en
Short-container-title:JMIR Med Inform

Author:

Pfaff Emily R^ORCID,Crosskey Miles^ORCID,Morton Kenneth^ORCID,Krishnamurthy Ashok^ORCID

Abstract

Computable phenotypes are algorithms that translate clinical features into code that can be run against electronic health record (EHR) data to define patient cohorts. However, computable phenotypes that only make use of structured EHR data do not capture the full richness of a patient’s medical record. While natural language processing (NLP) methods have shown success in extracting clinical features from text, the use of such tools has generally been limited to research groups with substantial NLP expertise. Our goal was to develop an open-source phenotyping software, Clinical Annotation Research Kit (CLARK), that would enable clinical and translational researchers to use machine learning–based NLP for computable phenotyping without requiring deep informatics expertise. CLARK enables nonexpert users to mine text using machine learning classifiers by specifying features for the software to match in clinical notes. Once the features are defined, the user-friendly CLARK interface allows the user to choose from a variety of standard machine learning algorithms (linear support vector machine, Gaussian Naïve Bayes, decision tree, and random forest), cross-validation methods, and the number of folds (cross-validation splits) to be used in evaluation of the classifier. Example phenotypes where CLARK has been applied include pediatric diabetes (sensitivity=0.91; specificity=0.98), symptomatic uterine fibroids (positive predictive value=0.81; negative predictive value=0.54), nonalcoholic fatty liver disease (sensitivity=0.90; specificity=0.94), and primary ciliary dyskinesia (sensitivity=0.88; specificity=1.0). In each of these use cases, CLARK allowed investigators to incorporate variables into their phenotype algorithm that would not be available as structured data. Moreover, the fact that nonexpert users can get started with machine learning–based NLP with limited informatics involvement is a significant improvement over the status quo. We hope to disseminate CLARK to other organizations that may not have NLP or machine learning specialists available, enabling wider use of these methods.

Publisher

JMIR Publications Inc.

Subject

Health Information Management,Health Informatics

Reference25 articles.

1. Desiderata for computable representations of electronic health records-driven phenotype algorithms

2. PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability

3. Electronic Medical Records for Genetic Research: Results of the eMERGE Consortium

4. Importance of multi-modal approaches to effectively identify cataract cases from electronic health records

5. A review of approaches to identifying patient phenotype cohorts using electronic health records

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Validation of a Computable Phenotype for Myocarditis/Pericarditis Following COVID-19 Vaccinations Using a Pilot Active Surveillance Electronic Healthcare Data Exchange Platform (Preprint);2023-11-15

2. Clinical Prediction Models for Hospital-Induced Delirium Using Structured and Unstructured Electronic Health Record Data: Protocol for a Development and Validation Study;JMIR Research Protocols;2023-11-09

3. Clinical Prediction Models for Hospital-Induced Delirium Using Structured and Unstructured Electronic Health Record Data: Protocol for a Development and Validation Study (Preprint);2023-04-26

4. Trends and opportunities in computable clinical phenotyping: A scoping review;Journal of Biomedical Informatics;2023-04

5. Leveraging Open Electronic Health Record Data and Environmental Exposures Data to Derive Insights Into Rare Pulmonary Disease;Frontiers in Artificial Intelligence;2022-06-28