Abstract
AbstractBackgroundPhenotypes identified during dysmorphology physical examinations are critical to genetic diagnosis and nearly universally documented as free-text in the electronic health record (EHR). Variation in how phenotypes are recorded in free-text makes large-scale computational analysis extremely challenging. Existing natural language processing (NLP) approaches to address phenotype extraction are trained largely on the biomedical literature or on case vignettes rather than actual EHR data.MethodsWe implemented a tailored system at the Children’s Hospital of Philadelpia that allows clinicians to document dysmorphology physical exam findings. From the underlying data, we manually annotated a corpus of 3136 organ system observations using the Human Phenotype Ontology (HPO). We provide this corpus publicly. We trained a transformer based NLP system to identify HPO terms from exam observations. The pipeline includes an extractor, which identifies tokens in the sentence expected to contain an HPO term, and a normalizer, which uses those tokens together with the original observation to determine the specific term mentioned.FindingsWe find that our labeler and normalizer NLP pipeline, which we call PhenoID, achieves state-of-the-art performance for the dysmorphology physical exam phenotype extraction task. PhenoID’s performance on the test set was 0.717, compared to the nearest baseline system (Pheno-Tagger) performance of 0.633. An analysis of our system’s normalization errors shows possible imperfections in the HPO terminology itself but also reveals a lack of semantic understanding by our transformer models.InterpretationTransformers-based NLP models are a promising approach to genetic phenotype extraction and, with recent development of larger pre-trained causal language models, may improve semantic understanding in the future. We believe our results also have direct applicability to more general extraction of medical signs and symptoms.FundingUS National Institutes of Health
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献