Development of a liver disease–specific large language model chat interface using retrieval-augmented generation

Author:

Ge Jin1ORCID,Sun Steve2,Owens Joseph2,Galvez Victor2,Gologorskaya Oksana23ORCID,Lai Jennifer C.1ORCID,Pletcher Mark J.4ORCID,Lai Ki2

Affiliation:

1. Department of Medicine, Division of Gastroenterology and Hepatology, University of California—San Francisco, San Francisco, California, USA

2. UCSF Health Information Technology, University of California—San Francisco, San Francisco, California, USA

3. Bakar Computational Health Sciences Institute, University of California—San Francisco, San Francisco, California, USA

4. Department of Epidemiology and Biostatistics, University of California—San Francisco, San Francisco, California, USA

Abstract

Background and Aims: Large language models (LLMs) have significant capabilities in clinical information processing tasks. Commercially available LLMs, however, are not optimized for clinical uses and are prone to generating hallucinatory information. Retrieval-augmented generation (RAG) is an enterprise architecture that allows the embedding of customized data into LLMs. This approach “specializes” the LLMs and is thought to reduce hallucinations. Approach and Results We developed “LiVersa,” a liver disease–specific LLM, by using our institution’s protected health information-complaint text embedding and LLM platform, “Versa.” We conducted RAG on 30 publicly available American Association for the Study of Liver Diseases guidance documents to be incorporated into LiVersa. We evaluated LiVersa’s performance by conducting 2 rounds of testing. First, we compared LiVersa’s outputs versus those of trainees from a previously published knowledge assessment. LiVersa answered all 10 questions correctly. Second, we asked 15 hepatologists to evaluate the outputs of 10 hepatology topic questions generated by LiVersa, OpenAI’s ChatGPT 4, and Meta’s Large Language Model Meta AI 2. LiVersa’s outputs were more accurate but were rated less comprehensive and safe compared to those of ChatGPT 4. Results: We evaluated LiVersa’s performance by conducting 2 rounds of testing. First, we compared LiVersa’s outputs versus those of trainees from a previously published knowledge assessment. LiVersa answered all 10 questions correctly. Second, we asked 15 hepatologists to evaluate the outputs of 10 hepatology topic questions generated by LiVersa, OpenAI’s ChatGPT 4, and Meta’s Large Language Model Meta AI 2. LiVersa’s outputs were more accurate but were rated less comprehensive and safe compared to those of ChatGPT 4. Conclusions: In this demonstration, we built disease-specific and protected health information-compliant LLMs using RAG. While LiVersa demonstrated higher accuracy in answering questions related to hepatology, there were some deficiencies due to limitations set by the number of documents used for RAG. LiVersa will likely require further refinement before potential live deployment. The LiVersa prototype, however, is a proof of concept for utilizing RAG to customize LLMs for clinical use cases.

Publisher

Ovid Technologies (Wolters Kluwer Health)

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3