Me-LLaMA: Foundation Large Language Models for Medical Applications-Reference-Cited by-同舟云学术

Me-LLaMA: Foundation Large Language Models for Medical Applications

Published:2024-05-22 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Xie Qianqian¹^ORCID,Chen Qingyu²,Chen Aokun³,Peng Cheng³^ORCID,Hu Yan⁴,Lin Fongci¹,Peng Xueqing¹,Huang Jimin¹,Zhang Jeffrey¹,Keloth Vipina¹,Zhou Xinyu¹,He Huan⁵,Ohno-Machado Lucila¹,Wu Yonghui³^ORCID,Xu Hua⁶,Bian Jiang³^ORCID

Affiliation:

1. Yale University

2. School of Medicine, Yale University

3. University of Florida

4. School of Biomedical Informatics, University of Texas Health Science, Center at Houston, Houston

5. Yale School of Medicine

6. University of Texas Health Science Center at Houston

Abstract

Recent advancements in large language models (LLMs) such as ChatGPT and LLaMA have hinted at their potential to revolutionize medical applications, yet their application in clinical settings often reveals limitations due to a lack of specialized training on medical-specific data. In response to this challenge, this study introduces Me-LLaMA, a novel medical LLM family that includes foundation models – Me-LLaMA 13/70B, along with their chat-enhanced versions – Me-LLaMA 13/70B-chat, developed through continual pre-training and instruction tuning of LLaMA2 using large medical datasets. Our methodology leverages a comprehensive domain-specific data suite, including a large-scale, continual pre-training dataset with 129B tokens, an instruction tuning dataset with 214k samples, and a new medical evaluation benchmark (MIBE) across six critical medical tasks with 12 datasets. Our extensive evaluation using the MIBE shows that Me-LLaMA models achieve overall better performance than existing open-source medical LLMs in zero-shot, few-shot and supervised learning abilities. With task-specific instruction tuning, Me-LLaMA models outperform ChatGPT on 7 out of 8 datasets and GPT-4 on 5 out of 8 datasets. In addition, we investigated the catastrophic forgetting problem, and our results show that Me-LLaMA models outperform other open-source medical LLMs in mitigating this issue. Me-LLaMA is one of the largest open-source medical foundation LLMs that use both biomedical and clinical data. It exhibits superior performance across both general and medical tasks compared to other open-source medical LLMs, rendering it an attractive choice for medical AI applications. We release our models, datasets, and evaluation scripts at: https://github.com/BIDS-Xu-Lab/Me-LLaMA.

Publisher

Research Square Platform LLC

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Call for papers: Special issue on biomedical multimodal large language models − novel approaches and applications;Journal of Biomedical Informatics;2024-09

2. Optimizing large language models in digestive disease: strategies and challenges to improve clinical outcomes;Liver International;2024-05-31