PhenoID, a language model normalizer of physical examinations from genetics clinical notes-Reference-Cited by-同舟云学术

PhenoID, a language model normalizer of physical examinations from genetics clinical notes

Published:2023-10-17 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Weissenbacher Davy^ORCID,Rawal Siddharth,Zhao Xinwei,Priestley Jessica R. C.,Szigety Katherine M.^ORCID,Schmidt Sarah F.,Higgins Mary J.,Magge Arjun,O’Connor Karen,Gonzalez-Hernandez Graciela^ORCID,Campbell Ian M.^ORCID

Abstract

AbstractBackgroundPhenotypes identified during dysmorphology physical examinations are critical to genetic diagnosis and nearly universally documented as free-text in the electronic health record (EHR). Variation in how phenotypes are recorded in free-text makes large-scale computational analysis extremely challenging. Existing natural language processing (NLP) approaches to address phenotype extraction are trained largely on the biomedical literature or on case vignettes rather than actual EHR data.MethodsWe implemented a tailored system at the Children’s Hospital of Philadelpia that allows clinicians to document dysmorphology physical exam findings. From the underlying data, we manually annotated a corpus of 3136 organ system observations using the Human Phenotype Ontology (HPO). We provide this corpus publicly. We trained a transformer based NLP system to identify HPO terms from exam observations. The pipeline includes an extractor, which identifies tokens in the sentence expected to contain an HPO term, and a normalizer, which uses those tokens together with the original observation to determine the specific term mentioned.FindingsWe find that our labeler and normalizer NLP pipeline, which we call PhenoID, achieves state-of-the-art performance for the dysmorphology physical exam phenotype extraction task. PhenoID’s performance on the test set was 0.717, compared to the nearest baseline system (Pheno-Tagger) performance of 0.633. An analysis of our system’s normalization errors shows possible imperfections in the HPO terminology itself but also reveals a lack of semantic understanding by our transformer models.InterpretationTransformers-based NLP models are a promising approach to genetic phenotype extraction and, with recent development of larger pre-trained causal language models, may improve semantic understanding in the future. We believe our results also have direct applicability to more general extraction of medical signs and symptoms.FundingUS National Institutes of Health

Publisher

Cold Spring Harbor Laboratory

Reference20 articles.

1. The Human Phenotype Ontology in 2021

2. Fei Li , ZhiChao Lin , Meishan Zhang , and Donghong Ji . A span-based model for joint overlapped and discontinuous named entity recognition. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4814–4828. Association for Computational Linguistics, 2021.

3. https://www.epic.com/about/. Last access September 13, 2023.

4. https://lhncbc.nlm.nih.gov/scrubber/. Last access September 11, 2023.

5. PhenoTagger: a hybrid method for phenotype concept recognition using human phenotype ontology;Bioinformatics,2021

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An evaluation of GPT models for phenotype concept recognition;BMC Medical Informatics and Decision Making;2024-01-31