Large language models leverage external knowledge to extend clinical insight beyond language boundaries-Reference-Cited by-同舟云学术

Large language models leverage external knowledge to extend clinical insight beyond language boundaries

Published:2024-04-29 Issue:9 Volume:31 Page:2054-2064
ISSN:1067-5027
Container-title:Journal of the American Medical Informatics Association
language:en
Short-container-title:

Author:

Wu Jiageng¹,Wu Xian²,Qiu Zhaopeng²,Li Minghui¹,Lin Shixu¹,Zhang Yingying²,Zheng Yefeng²,Yuan Changzheng¹³,Yang Jie¹⁴⁵

Affiliation:

1. School of Public Health, Zhejiang University School of Medicine , Hangzhou, 310058, China

2. Jarvis Research Center, Tencent YouTu Lab , Beijing, 100101, China

3. Department of Nutrition, Harvard T.H. Chan School of Public Health , Boston, MA 02115, United States

4. Division of Pharmacoepidemiology and Pharmacoeconomics , Department of Medicine, Brigham and Women’s Hospital, , Boston, MA 02115, United States

5. Harvard Medical School , Department of Medicine, Brigham and Women’s Hospital, , Boston, MA 02115, United States

Abstract

Abstract Objectives Large Language Models (LLMs) such as ChatGPT and Med-PaLM have excelled in various medical question-answering tasks. However, these English-centric models encounter challenges in non-English clinical settings, primarily due to limited clinical knowledge in respective languages, a consequence of imbalanced training corpora. We systematically evaluate LLMs in the Chinese medical context and develop a novel in-context learning framework to enhance their performance. Materials and Methods The latest China National Medical Licensing Examination (CNMLE-2022) served as the benchmark. We collected 53 medical books and 381 149 medical questions to construct the medical knowledge base and question bank. The proposed Knowledge and Few-shot Enhancement In-context Learning (KFE) framework leverages the in-context learning ability of LLMs to integrate diverse external clinical knowledge sources. We evaluated KFE with ChatGPT (GPT-3.5), GPT-4, Baichuan2-7B, Baichuan2-13B, and QWEN-72B in CNMLE-2022 and further investigated the effectiveness of different pathways for incorporating LLMs with medical knowledge from 7 distinct perspectives. Results Directly applying ChatGPT failed to qualify for the CNMLE-2022 at a score of 51. Cooperated with the KFE framework, the LLMs with varying sizes yielded consistent and significant improvements. The ChatGPT’s performance surged to 70.04 and GPT-4 achieved the highest score of 82.59. This surpasses the qualification threshold (60) and exceeds the average human score of 68.70, affirming the effectiveness and robustness of the framework. It also enabled a smaller Baichuan2-13B to pass the examination, showcasing the great potential in low-resource settings. Discussion and Conclusion This study shed light on the optimal practices to enhance the capabilities of LLMs in non-English medical scenarios. By synergizing medical knowledge through in-context learning, LLMs can extend clinical insight beyond language barriers in healthcare, significantly reducing language-related disparities of LLM applications and ensuring global benefit in this field.

Publisher

Oxford University Press (OUP)

Link

https://academic.oup.com/jamia/article-pdf/31/9/2054/58868121/ocae079.pdf

Reference75 articles.

1. Language models are unsupervised multitask learners;Radford;OpenAI Blog,2019

2. Large language models in medicine;Thirunavukarasu;Nat. Med,2023

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Large language models in biomedicine and health: current research landscape and future directions;Journal of the American Medical Informatics Association;2024-08-22

2. Characterizing Public Sentiments and Drug Interactions during COVID-19: A Pretrained Language Model and Network Analysis of Social Media Discourse (Preprint);2024-06-28

3. Characterizing Public Sentiments and Drug Interactions during COVID-19: A Pretrained Language Model and Network Analysis of Social Media Discourse;2024-06-06

4. Clinical Text Datasets for Medical Artificial Intelligence and Large Language Models — A Systematic Review;NEJM AI;2024-05-23