Performance of ChatGPT-4o in Real-Time Medical Consultation for Retroperitoneal Fibrosis Patients Under Doctor Supervision: A Cross-Sectional Study in a Chinese Clinical Setting (Preprint)

Author:

Gao Hui,Zhang WujiORCID,Liu Shibo,Li Yuanning,Zhu Yingxi,Long Ting,Yu Ruohan,Guo Qian,Zou Yadan,Li Ji,Zhang Lina,Yang Cui,Tong Yubing,Zhang Xuewu

Abstract

BACKGROUND

LLMs like GPT-4 show promise in medical consultations but face challenges in non-English or real-time contexts. The new GPT-4o, with improved text processing and faster responses, may better address rare diseases like retroperitoneal fibrosis (RPF).

OBJECTIVE

Performance of GPT-4o in providing real-time medical consultations for patients with rare disease remains underexplored, which is generally a challenge in clinical practice. We evaluate the competency of GPT-4o to generate responses to a rare autoimmune RPF on accuracy, completeness, readability, and quality, using a 7-point Likert scale.

METHODS

A total of 103 real-world RPF patients queries were collected from diverse sources. Responses were generated using the newly released version of GPT-4o (2024/5/17). All questions were also stratified and randomly divided into six groups. Six attending rheumatologists were assigned to answer one set of questions, then generated new responses with assistance of GPT-4o. All the responses were assessed blindly by three experts in RPF.

RESULTS

GPT-4o scored significantly higher than rheumatologists in accuracy (6.39 ± 0.50 vs. 4.99 ± 0.62), completeness (6.51 ± 0.44 vs. 4.55 ± 0.60), readability (6.45 ± 0.42 vs. 4.93 ± 0.59), and quality (6.42 ± 0.46 vs. 4.78 ± 0.55) (p < 0.001). Competency of rheumatologists + GPT-4o was better than that of rheumatologists alone (accuracy: 6.13 ± 0.63, completeness: 5.99 ± 0.81, readability: 6.05 ± 0.67, quality: 6.01 ± 0.71. p < 0.001), and physician revisions generally reduced the competency of GPT-4o. Subgroup analysis showed no significant difference on accuracy between GPT-4o and rheumatologists + GPT-4o in answering complex questions, but any type of revision lowered the competency of GPT-4o.

CONCLUSIONS

GPT-4o has the potential to provide real-time medical consultations for RPF in the Chinese clinical environment.

Publisher

JMIR Publications Inc.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3