LCD Benchmark: Long Clinical Document Benchmark on Mortality Prediction for Language Models-Reference-Cited by-同舟云学术

LCD Benchmark: Long Clinical Document Benchmark on Mortality Prediction for Language Models

Published:2024-03-27 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Yoon WonJin^ORCID,Chen Shan^ORCID,Gao Yanjun,Zhao Zhanzhan,Dligach Dmitriy,Bitterman Danielle S.,Afshar Majid,Miller Timothy

Abstract

ABSTRACTObjectiveThe application of Natural Language Processing (NLP) in the clinical domain is important due to the rich unstructured information in clinical documents, which often remains inaccessible in structured data. When applying NLP methods to a certain domain, the role of benchmark datasets is crucial as benchmark datasets not only guide the selection of best-performing models but also enable the assessment of the reliability of the generated outputs. Despite the recent availability of language models (LMs) capable of longer context, benchmark datasets targeting long clinical document classification tasks are absent.Materials and MethodsTo address this issue, we propose LCD benchmark, a benchmark for the task of predicting 30-day out-of-hospital mortality using discharge notes of MIMIC-IV and statewide death data. We evaluated this benchmark dataset using baseline models, from bag-of-words and CNN to instruction-tuned large language models. Additionally, we provide a comprehensive analysis of the model outputs, including manual review and visualization of model weights, to offer insights into their predictive capabilities and limitations.Results and DiscussionBaseline models showed 28.9% for best-performing supervised models and 32.2% for GPT-4 in F1-metrics. Notes in our dataset have a median word count of 1687. Our analysis of the model outputs showed that our dataset is challenging for both models and human experts, but the models can find meaningful signals from the text.ConclusionWe expect our LCD benchmark to be a resource for the development of advanced supervised models, or prompting methods, tailored for clinical text.The benchmark dataset is available athttps://github.com/Machine-Learning-for-Medical-Language/long-clinical-doc

Publisher

Cold Spring Harbor Laboratory

Reference42 articles.

1. Deep learning in clinical natural language processing: a methodical review

2. Si, Y. et al. Deep representation learning of patient data from Electronic Health Records (EHR): A systematic review. J. Biomed. Inform. 115, 103671 (2021).

3. Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records

4. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing;ACM Trans. Comput. Healthc,2022