Characterizing Variability of EHR-Driven Phenotype Definitions-Reference-Cited by-同舟云学术

Characterizing Variability of EHR-Driven Phenotype Definitions

Published:2022-07-10 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Brandt Pascal S.^ORCID,Kho Abel,Luo Yuan,Pacheco Jennifer A.,Walunas Theresa L.^ORCID,Hakonarson Hakon,Hripcsak George,Liu Cong,Shang Ning,Weng Chunhua,Walton Nephi,Carrell David S.,Crane Paul K.,Larson Eric,Chute Christopher G.,Kullo Iftikhar,Carroll Robert,Denny Josh,Ramirez Andrea,Wei Wei-Qi,Pathak Jyoti,Wiley Laura K.^ORCID,Richesson Rachel,Starren Justin B.,Rasmussen Luke V.^ORCID

Abstract

ABSTRACTObjectiveAnalyze a publicly available sample of rule-based phenotype definitions to characterize and evaluate the types of logical constructs used.Materials & MethodsA sample of 33 phenotype definitions used in research and published to the Phenotype KnowledgeBase (PheKB), that are represented using Fast Healthcare Interoperability Resources (FHIR) and Clinical Quality Language (CQL) was analyzed using automated analysis of the computable representation of the CQL libraries.ResultsMost of the phenotype definitions include narrative descriptions and flowcharts, while few provide pseudocode or executable artifacts. Most use 4 or fewer medical terminologies. The number of codes used ranges from 5 to 6865, and value sets from 1 to 19. We found the most common expressions used were literal, data, and logical expressions. Aggregate and arithmetic expressions are the least common. Expression depth ranges from 4 to 27.DiscussionDespite the range of conditions, we found that all of the phenotype definitions consisted of logical criteria, representing both clinical and operational logic, and tabular data, consisting of codes from standard terminologies and keywords for natural language processing. The total number and variety of expressions is low, which may be to simplify implementation, or authors may limit complexity due to data availability constraints.ConclusionThe phenotypes analyzed show significant variation in specific logical, arithmetic and other operators, but are all composed of the same high-level components, namely tabular data and logical expressions. A standard representation for phenotype definitions should support these formats and be modular to support localization and shared logic.

Publisher

Cold Spring Harbor Laboratory

Reference41 articles.

1. Launching PCORnet, a national patient-centered clinical research network

2. The eMERGE Network: A consortium of biorepositories linked to electronic medical records data for conducting genomic studies

3. The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future