Data-driven automated classification algorithms for acute health conditions: applying PheNorm to COVID-19 disease

Author:

Smith Joshua C1,Williamson Brian D2,Cronkite David J2,Park Daniel1,Whitaker Jill M1,McLemore Michael F1,Osmanski Joshua T1,Winter Robert1,Ramaprasan Arvind2,Kelley Ann2,Shea Mary2,Wittayanukorn Saranrat3,Stojanovic Danijela3,Zhao Yueqin3,Toh Sengwee4,Johnson Kevin B5,Aronoff David M6,Carrell David S2ORCID

Affiliation:

1. Department of Biomedical Informatics, Vanderbilt University Medical Center , Nashville, TN 37203, United States

2. Kaiser Permanente Washington Health Research Institute , Seattle, WA 98101, United States

3. Center for Drug Evaluation and Research, US Food and Drug Administration , Silver Spring, MD 20903, United States

4. Harvard Pilgrim Health Care Institute , Boston, MA 02215, United States

5. Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania , Philadelphia, PA 19104, United States

6. Department of Medicine, Indiana University School of Medicine , Indianapolis, IN 46202, United States

Abstract

Abstract Objectives Automated phenotyping algorithms can reduce development time and operator dependence compared to manually developed algorithms. One such approach, PheNorm, has performed well for identifying chronic health conditions, but its performance for acute conditions is largely unknown. Herein, we implement and evaluate PheNorm applied to symptomatic COVID-19 disease to investigate its potential feasibility for rapid phenotyping of acute health conditions. Materials and methods PheNorm is a general-purpose automated approach to creating computable phenotype algorithms based on natural language processing, machine learning, and (low cost) silver-standard training labels. We applied PheNorm to cohorts of potential COVID-19 patients from 2 institutions and used gold-standard manual chart review data to investigate the impact on performance of alternative feature engineering options and implementing externally trained models without local retraining. Results Models at each institution achieved AUC, sensitivity, and positive predictive value of 0.853, 0.879, 0.851 and 0.804, 0.976, and 0.885, respectively, at quantiles of model-predicted risk that maximize F1. We report performance metrics for all combinations of silver labels, feature engineering options, and models trained internally versus externally. Discussion Phenotyping algorithms developed using PheNorm performed well at both institutions. Performance varied with different silver-standard labels and feature engineering options. Models developed locally at one site also worked well when implemented externally at the other site. Conclusion PheNorm models successfully identified an acute health condition, symptomatic COVID-19. The simplicity of the PheNorm approach allows it to be applied at multiple study sites with substantially reduced overhead compared to traditional approaches.

Funder

U.S. Food and Drug Administration

National Center for Advancing Translational Sciences

National Institutes of Health

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3