A Comparative Analysis of Large language Models on Clinical Questions for Autoimmune Diseases-Reference-Cited by-同舟云学术

A Comparative Analysis of Large language Models on Clinical Questions for Autoimmune Diseases

Published:2024-08-27 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Zhang Weiming¹,Yu Jie¹,Ma Juntao¹,Feng Jiawei¹,Geng Linyu¹,Chen Yuxin¹,Zhang Huayong¹,Ning Mingzhe¹

Affiliation:

1. Nanjing Drum Tower Hospital, Nanjing University

Abstract

Background Artificial intelligence (AI) has made great strides. Our study evaluated the performance in delivering clinical questions related to autoimmune diseases (AIDs). Methods 46 AIDs-related questions were compiled and entered into ChatGPT 3.5, ChatGPT 4.0, and Gemini. The replies were collected and sent to laboratory specialists for scoring according to relevance, correctness, completeness, helpfulness, and safety. Scores for three chatbots in five quality dimensions and the scores of the replies to the questions under each quality dimension were analyzed. Results ChatGPT 4.0 showed superior performance than ChatGPT 3.5 and Gemini in all five quality dimensions. ChatGPT 4.0 outperformed ChatGPT 3.5 or Gemini on the relevance, completeness or helpfulness in answering about the prognosis, diagnosis, or the report interpretation of AIDs. ChatGPT 4.0’s replies were the longest, followed by ChatGPT 3.5, Gemini’s was the shortest. Conclusions Our findings highlight ChatGPT 4.0 is superior to delivering comprehensive and accurate responses to AIDs-related clinical questions.

Publisher

Springer Science and Business Media LLC

Reference23 articles.

1. Advances in natural language processing;Hirschberg J;Science,2015

2. ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health;Angelis L;Front Public Health,2023

3. Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine;Lee P;N Engl J Med,2023

4. GPT-4 is here: what scientists think;Sanderson K;Nature,2023

5. Assessing the accuracy, usefulness, and readability of artificial-intelligence-generated responses to common dermatologic surgery questions for patient education: A double-blinded comparative study of ChatGPT and Google Bard;Robinson MA;J Am Acad Dermatol,2024