PheW2P2V: a phenome-wide prediction framework with weighted patient representations using electronic health records

Author:

Guo Jia1ORCID,Kiryluk Krzysztof2,Wang Shuang1ORCID

Affiliation:

1. Department of Biostatistics, Columbia University , New York, NY 10032, United States

2. Department of Medicine, Columbia University , New York, NY 10032, United States

Abstract

Abstract Objective Electronic health records (EHRs) provide opportunities for the development of computable predictive tools. Conventional machine learning methods and deep learning methods have been widely used for this task, with the approach of usually designing one tool for one clinical outcome. Here we developed PheW2P2V, a Phenome-Wide prediction framework using Weighted Patient Vectors. PheW2P2V conducts tailored predictions for phenome-wide phenotypes using numeric representations of patients’ past medical records weighted based on their similarities with individual phenotypes. Materials and Methods PheW2P2V defines clinical disease phenotypes using Phecode mapping based on International Classification of Disease codes, which reduces redundancy and case-control misclassification in real-life EHR datasets. Through upweighting medical records of patients that are more relevant to a phenotype of interest in calculating patient vectors, PheW2P2V achieves tailored incidence risk prediction of a phenotype. The calculation of weighted patient vectors is computationally efficient, and the weighting mechanism ensures tailored predictions across the phenome. We evaluated prediction performance of PheW2P2V and baseline methods with simulation studies and clinical applications using the MIMIC-III database. Results Across 942 phenome-wide predictions using the MIMIC-III database, PheW2P2V has median area under the receiver operating characteristic curve (AUC-ROC) 0.74 (baseline methods have values ≤0.72), median max F1-score 0.20 (baseline methods have values ≤0.19), and median area under the precision-recall curve (AUC-PR) 0.10 (baseline methods have values ≤0.10). Discussion PheW2P2V can predict phenotypes efficiently by using medical concept embeddings and upweighting relevant past medical histories. By leveraging both labeled and unlabeled data, PheW2P2V reduces overfitting and improves predictions for rare phenotypes, making it a useful screening tool for early diagnosis of high-risk conditions, though further research is needed to assess the transferability of embeddings across different databases. Conclusions PheW2P2V is fast, flexible, and has superior prediction performance for many clinical disease phenotypes across the phenome of the MIMIC-III database compared to that of several popular baseline methods.

Funder

National Library of Medicine

Publisher

Oxford University Press (OUP)

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3