Evaluating machine learning approaches for multi-label classification of unstructured electronic health records with a generative large language model-Reference-Cited by-同舟云学术

Evaluating machine learning approaches for multi-label classification of unstructured electronic health records with a generative large language model

Published:2024-06-27 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Vithanage Dinithi^ORCID,Deng Chao^ORCID,Wang Lei^ORCID,Yin Mengyang^ORCID,Alkhalaf Mohammad,Zhang Zhenyua^ORCID,Zhu Yunshu^ORCID,Soewargo Alan Christy^ORCID,Yu Ping^ORCID

Abstract

AbstractMulti-label classification of unstructured electronic health records (EHR) poses challenges due to the inherent semantic complexity in textual data. Advances in natural language processing (NLP) using large language models (LLMs) show promise in addressing these issues. Identifying the most effective machine learning method for EHR classification in real-world clinical settings is crucial. Therefore, this experimental research aims to test the effect of zero-shot and few-shot learning prompting strategies, with and without Parameter Efficient Fine-tuning (PEFT) LLMs, on the multi-label classification of the EHR data set. The labels tested are across four clinical classification tasks: agitation in dementia, depression in dementia, frailty index, and malnutrition risk factors. We utilise unstructured EHR data from residential aged care facilities (RACFs), employing the Llama 2-Chat 13B-parameter model as our generative AI-based large language model (LLM). Performance evaluation includes accuracy, precision, recall, and F1 score supported by non-parametric statistical analyses. Results indicate the same level of performance with the same prompting template, either zero-shot or few-shot learning across the four clinical tasks. Few-shot learning outperforms zero-shot learning without PEFT. The study emphasises the significantly enhanced effectiveness of fine-tuning in conjunction with zero-shot and few-shot learning. The performance of zero-shot learning reached the same level as few-shot learning after PEFT. The analysis underscores that LLMs with PEFT for specific clinical tasks maintain their performance across diverse clinical tasks. These findings offer crucial insights into LLMs for researchers, practitioners, and stakeholders utilising LLMs in clinical document analysis.

Publisher

Cold Spring Harbor Laboratory

Reference26 articles.

1. Health system-scale language models are all-purpose prediction engines;Nature,2023

2. Bhate, N.J. , et al., Zero-shot Learning with Minimum Instruction to Extract Social Determinants and Family History from Clinical Notes using GPT Model. arXiv preprint arXiv:2309.05475, 2023.

3. Cardiovascular Care Innovation through Data-Driven Discoveries in the Electronic Health Record;Am J Cardiol,2023

4. Ge, J. , et al., A comparison of large language model versus manual chart review for extraction of data elements from the electronic health record. medRxiv, 2023.

5. Ji, B. , VicunaNER: Zero/Few-shot Named Entity Recognition using Vicuna. arXiv preprint arXiv:2305.03253, 2023.