Evaluating resources composing the PheMAP knowledge base to enhance high-throughput phenotyping-Reference-Cited by-同舟云学术

Evaluating resources composing the PheMAP knowledge base to enhance high-throughput phenotyping

Published:2022-11-30 Issue:3 Volume:30 Page:456-465
ISSN:1067-5027
Container-title:Journal of the American Medical Informatics Association
language:en
Short-container-title:

Author:

Wan Nicholas C¹,Yaqoob Ali A²,Ong Henry H²,Zhao Juan²,Wei Wei-Qi²

Affiliation:

1. Department of Biomedical Engineering, Vanderbilt University , Nashville, Tennessee, USA

2. Department of Biomedical Informatics, Vanderbilt University Medical Center , Nashville, Tennessee, USA

Abstract

AbstractObjectiveA previous study, PheMAP, combined independent, online resources to enable high-throughput phenotyping (HTP) using electronic health records (EHRs). However, online resources offer distinct quality descriptions of diseases which may affect phenotyping performance. We aimed to evaluate the phenotyping performance of single resource-based PheMAPs and investigate an optimized strategy for HTP.Materials and MethodsWe compared how each resource produced top-ranked concept unique identifiers (CUIs) by term frequency—inverse document frequency with Jaccard matrices comparing single resources and the original PheMAP. We correlated top-ranked concepts from each resource to features used in established Phenotype KnowledgeBase (PheKB) algorithms for hypothyroidism, type II diabetes mellitus (T2DM), and dementias. Using resources separately, we calculated multiple phenotype risk scores for individuals from Vanderbilt University Medical Center’s BioVU DNA Biobank and compared phenotyping performance against rule-based eMERGE algorithms. Lastly, we implemented an ensemble strategy which classified patient case/control status based upon PheMAP resource agreement.ResultsJaccard similarity matrices indicate that the similarity of CUIs comprising single resource-based PheMAPs varies. Single resource-based PheMAPs generated from MedlinePlus and MedicineNet outperformed others but only encompass 81.6% of overall disease phenotypes. We propose the PheMAP-Ensemble which provides higher average accuracy and precision than the combined average accuracy and precision of single resource-based PheMAPs. While offering complete phenotype coverage, PheMAP-Ensemble significantly increases phenotyping recall compared to the original iteration.ConclusionsResources comprising the PheMAP produce different phenotyping performance when implemented individually. The ensemble method significantly improves the quality of PheMAP by fully utilizing dissimilar resources to capture accurate phenotyping data from EHRs.

Funder

National Institutes of Health

Vanderbilt University Medical Center

National Center for Advancing Translational Science

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

Link

https://academic.oup.com/jamia/article-pdf/30/3/456/49198743/ocac234.pdf

Reference29 articles.

1. Using electronic health records to drive discovery in disease genomics;Kohane;Nat Rev Genet,2011