Large Language Models in the Clinic: A Comprehensive Benchmark-Reference-Cited by-同舟云学术

Large Language Models in the Clinic: A Comprehensive Benchmark

Published:2024-04-25 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Liu Fenglin,Zhou Hongjian,Hua Yining^ORCID,Rohanian Omid,Thakur Anshul,Clifton Lei^ORCID,Clifton David A.

Abstract

AbstractThe adoption of large language models (LLMs) to assist clinicians has attracted remarkable attention. Existing works mainly adopt the close-ended question-answering (QA) task with answer options for evaluation. However, many clinical decisions involve answering openended questions without pre-set options. To better understand LLMs in the clinic, we construct a benchmarkClinicBench. We first collect eleven existing datasets covering diverse clinical language generation, understanding, and reasoning tasks. Furthermore, we construct six novel datasets and complex clinical tasks that are close to real-world practice, i.e., referral QA, treatment recommendation, hospitalization (longdocument) summarization, patient education, pharmacology QA and drug interaction for emerging drugs. We conduct an extensive evaluation of twenty-two LLMs under both zero-shot and few-shot settings. Finally, we invite medical experts to evaluate the clinical usefulness of LLMs.

Publisher

Cold Spring Harbor Laboratory

Reference71 articles.

1. Anthropic. 2023. Claude-2.

2. Training a helpful and harmless assistant with reinforcement learning from human feedback;arXiv preprint,2022

3. Automatic semantic classification of scientific literature according to the hallmarks of cancer

4. The Genetic Association Database

5. Olivier Bodenreider . 2004a. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Performance of Open-Source LLMs in Challenging Radiological Cases – A Benchmark Study on 4,049 Eurorad Case Reports;2024-09-06

2. Multi-step Transfer Learning in Natural Language Processing for the Health Domain;Neural Processing Letters;2024-05-20