Identifying signs and symptoms of urinary tract infection from emergency department clinical notes using large language models

Author:

Iscoe Mark12ORCID,Socrates Vimig23,Gilson Aidan4,Chi Ling5,Li Huan3,Huang Thomas4,Kearns Thomas1,Perkins Rachelle1,Khandjian Laura1,Taylor R. Andrew12ORCID

Affiliation:

1. Department of Emergency Medicine Yale School of Medicine New Haven Connecticut USA

2. Section for Biomedical Informatics and Data Science Yale University School of Medicine New Haven Connecticut USA

3. Program of Computational Biology and Bioinformatics Yale University New Haven Connecticut USA

4. Yale School of Medicine New Haven Connecticut USA

5. Department of Biostatistics Yale School of Public Health New Haven Connecticut USA

Abstract

AbstractBackgroundNatural language processing (NLP) tools including recently developed large language models (LLMs) have myriad potential applications in medical care and research, including the efficient labeling and classification of unstructured text such as electronic health record (EHR) notes. This opens the door to large‐scale projects that rely on variables that are not typically recorded in a structured form, such as patient signs and symptoms.ObjectivesThis study is designed to acquaint the emergency medicine research community with the foundational elements of NLP, highlighting essential terminology, annotation methodologies, and the intricacies involved in training and evaluating NLP models. Symptom characterization is critical to urinary tract infection (UTI) diagnosis, but identification of symptoms from the EHR has historically been challenging, limiting large‐scale research, public health surveillance, and EHR‐based clinical decision support. We therefore developed and compared two NLP models to identify UTI symptoms from unstructured emergency department (ED) notes.MethodsThe study population consisted of patients aged ≥ 18 who presented to an ED in a northeastern U.S. health system between June 2013 and August 2021 and had a urinalysis performed. We annotated a random subset of 1250 ED clinician notes from these visits for a list of 17 UTI symptoms. We then developed two task‐specific LLMs to perform the task of named entity recognition: a convolutional neural network‐based model (SpaCy) and a transformer‐based model designed to process longer documents (Clinical Longformer). Models were trained on 1000 notes and tested on a holdout set of 250 notes. We compared model performance (precision, recall, F1 measure) at identifying the presence or absence of UTI symptoms at the note level.ResultsA total of 8135 entities were identified in 1250 notes; 83.6% of notes included at least one entity. Overall F1 measure for note‐level symptom identification weighted by entity frequency was 0.84 for the SpaCy model and 0.88 for the Longformer model. F1 measure for identifying presence or absence of any UTI symptom in a clinical note was 0.96 (232/250 correctly classified) for the SpaCy model and 0.98 (240/250 correctly classified) for the Longformer model.ConclusionsThe study demonstrated the utility of LLMs and transformer‐based models in particular for extracting UTI symptoms from unstructured ED clinical notes; models were highly accurate for detecting the presence or absence of any UTI symptom on the note level, with variable performance for individual symptoms.

Funder

Yale School of Medicine

Publisher

Wiley

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3