Large language models for accurate disease detection in electronic health records

Author:

Bürgisser NilsORCID,Chalot Etienne,Mehouachi Samia,Buclin Clement P.ORCID,Lauper KimORCID,Courvoisier Delphine S.ORCID,Mongin DenisORCID

Abstract

AbstractImportanceThe use of large language models (LLMs) in medicine is increasing, with potential applications in electronic health records (EHR) to create patient cohorts or identify patients who meet clinical trial recruitment criteria. However, significant barriers remain, including the extensive computer resources required, lack of performance evaluation, and challenges in implementation.ObjectiveThis study aims to propose and test a framework to detect disease diagnosis using a recent light LLM on French-language EHR documents. Specifically, it focuses on detecting gout (“goutte” in French), a ubiquitous French term that have multiple meanings beyond the disease. The study will compare the performance of the LLM-based framework with traditional natural language processing techniques and test its dependence on the parameter used.DesignThe framework was developed using a training and testing set of 700 paragraphs assessing “gout”, issued from a random selection of retrospective EHR documents. All paragraphs were manually reviewed and classified by two health-care professionals (HCP) into disease (true gout) and non-disease (gold standard). The LLM’s accuracy was tested using few-shot and chain-of-thought prompting and compared to a regular expression (regex)-based method, focusing on the effects of model parameters and prompt structure. The framework was further validated on 600 paragraphs assessing “Calcium Pyrophosphate Deposition Disease (CPPD)”.SettingThe documents were sampled from the electronic health-records of a tertiary university hospital in Geneva, Switzerland.ParticipantsAdults over 18 years of age.ExposureMeta’s Llama 3 8B LLM or traditional method, against a gold standard.Main Outcomes and MeasuresPositive and negative predictive value, as well as accuracy of tested models.ResultsThe LLM-based algorithm outperformed the regex method, achieving a 92.7% [88.7-95.4%] positive predictive value, a 96.6% [94.6-97.8%] negative predictive value, and an accuracy of 95.4% [93.6-96.7%] for gout. In the validation set on CPPD, accuracy was 94.1% [90.2-97.6%]. The LLM framework performed well over a wide range of parameter values.Conclusions and RelevanceLLMs were able to accurately detect disease diagnoses from EHRs, even in non-English languages. They could facilitate creating large disease registries in any language, improving disease care assessment and patient recruitment for clinical trials.Key pointsQuestionHow accurate and efficient are large language models (LLMs) in detecting diseases from unstructured electronic health records (EHR) text compared to traditional natural language processing techniques?FindingsThis study proposes a framework based on Meta’s Llama 3 8B, a recent public LLM, outperforming traditional natural language processing techniques in detecting gout and calcium pyrophosphate deposition disease in unstructured text. It achieves high positive and negative predictive values and accuracy. Performance was robust over a wide range of parameters.MeaningThe proposed framework can ease the use of LLMs in effectively detecting disease in EHR data for various clinical applications.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3