ChatGPT4’s proficiency in addressing patients’ questions on systemic lupus erythematosus: a blinded comparative study with specialists-Reference-Cited by-同舟云学术

ChatGPT4’s proficiency in addressing patients’ questions on systemic lupus erythematosus: a blinded comparative study with specialists

Published:2024-04-22 Issue:9 Volume:63 Page:2450-2456
ISSN:1462-0324
Container-title:Rheumatology
language:en
Short-container-title:

Author:

Xu Dan¹^ORCID,Zhao Jinxia¹,Liu Rui¹,Dai Yijun²,Sun Kai³,Wong Priscilla⁴,Ming Samuel Lee Shang⁵,Wearn Koh Li⁵,Wang Jiangyuan⁶,Xie Shasha⁶,Zeng Lin⁷,Mu Rong¹,Xu Chuanhui⁵⁸^ORCID

Affiliation:

1. Department of Rheumatology and Immunology, Peking University Third Hospital , Beijing, China

2. Department of Rheumatology and Immunology, Fujian Provincial Hospital , Fuzhou, China

3. Department of Medicine, Division of Rheumatology and Immunology, Duke University , Durham, North Carolina, USA

4. Department of Medicine and Therapeutics, Prince of Wales Hospital, The Chinese University of Hong Kong , Hong Kong, China

5. Department of Rheumatology, Allergy and Immunology, Tan Tock Seng Hospital , Singapore, Singapore

6. Beijing Kidney Health Technology Co., Ltd , Beijing, China

7. Research Center of Clinical Epidemiology, Peking University Third Hospital , Beijing, China

8. Lee Kong Chian School of Medicine, Nanyang Technological University , Singapore, Singapore

Abstract

Abstract Objectives The efficacy of artificial intelligence (AI)-driven chatbots like ChatGPT4 in specialized medical consultations, particularly in rheumatology, remains underexplored. This study compares the proficiency of ChatGPT4’ responses with practicing rheumatologists to inquiries from patients with SLE. Methods In this cross-sectional study, we curated 95 frequently asked questions (FAQs), including 55 in Chinese and 40 in English. Responses for FAQs from ChatGPT4 and five rheumatologists were scored separately by a panel of rheumatologists and a group of patients with SLE across six domains (scientific validity, logical consistency, comprehensibility, completeness, satisfaction level and empathy) on a 0–10 scale (a score of 0 indicates entirely incorrect responses, while 10 indicates accurate and comprehensive answers). Results Rheumatologists’ scoring revealed that ChatGPT4-generated responses outperformed those from rheumatologists in satisfaction level and empathy, with mean differences of 0.537 (95% CI, 0.252–0.823; P < 0.01) and 0.460 (95% CI, 0.227–0.693; P < 0.01), respectively. From the SLE patients’ perspective, ChatGPT4-generated responses were comparable to the rheumatologist-provided answers in all six domains. Subgroup analysis revealed ChatGPT4 responses were more logically consistent and complete regardless of language and exhibited greater comprehensibility, satisfaction and empathy in Chinese. However, ChatGPT4 responses were inferior in comprehensibility for English FAQs. Conclusion ChatGPT4 demonstrated comparable, possibly better in certain domains, to address FAQs from patients with SLE, when compared with the answers provided by specialists. This study showed the potential of applying ChatGPT4 to improve consultation in SLE patients.

Funder

National Natural Science Foundation of China

NMRC Clinician-Scientist Individual Research

NHG-LKCMedicine Clinician-Scientist Career Scheme

National Center for Advancing Translational Sciences

National Institutes of Health

American Heart Association COVID-19 Fund to Retain Clinical Scientists

Publisher

Oxford University Press (OUP)

Link

https://academic.oup.com/rheumatology/advance-article-pdf/doi/10.1093/rheumatology/keae238/57726530/keae238.pdf

Reference18 articles.

1. Large language models in medicine;Thirunavukarasu;Nat Med,2023

2. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum;Ayers;JAMA Intern Med,2023

3. Diagnostic accuracy of a large language model in rheumatology: comparison of physician and ChatGPT-4;Krusche;Rheumatol Int,2023

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Performance of ChatGPT-4o in Real-Time Medical Consultation for Retroperitoneal Fibrosis Patients Under Doctor Supervision: A Cross-Sectional Study in a Chinese Clinical Setting (Preprint);2024-07-30