Comparative Diagnostic Accuracy Between ChatGPT-4 With versus Without Vision in Clinical Descriptions: an experimental study (Preprint)-Reference-Cited by-同舟云学术

Comparative Diagnostic Accuracy Between ChatGPT-4 With versus Without Vision in Clinical Descriptions: an experimental study (Preprint)

Published:2023-12-18 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Hirosawa Takanobu^ORCID,Harada Yukinori^ORCID,Tokumasu Kazuki^ORCID,Ito Takahiro,Suzuki Tomoharu^ORCID,Shimizu Taro^ORCID

Abstract

BACKGROUND

There are several multimodal generative artificial intelligence (AI) systems, including ChatGPT-4 with vision, also known as ChatGPT-4V or ChatGPT-4Vision, accept image data with text data. However, the change in diagnostic accuracy of ChatGPT-4 by adding image data is unknown.

OBJECTIVE

We compared the diagnostic accuracy between ChatGPT-4 with vision, inputting text and image (intervention) and ChatGPT-4 without vision, inputting only text (control), for case descriptions derived by case reports.

METHODS

We used the dataset of case descriptions and final diagnoses derived from the American Journal of Case Reports published from January 2022 to March 2023. We also extracted the figures and tables mentioned in case descriptions as image data. We excluded non-diagnostics, pediatric, and case reports without figures or tables in their case descriptions. From the case descriptions and images, ChatGPT-4 with vision generated the differential-diagnosis lists. We compared the diagnostic accuracy by ChatGPT-4 without vision, which was inputted the same case descriptions without images. Two physicians independently evaluated whether the final diagnosis was included in the lists. Discrepancies were resolved by another physician.

RESULTS

A total of 363 case descriptions were included. The rate of final diagnoses within the top 10 differential-diagnosis lists generated by ChatGPT-4 with vision was 85.1% (309/363), which was not different compared to 87.9% (319/363) by ChatGPT-4 without vision (P=.33). The rate of final diagnoses as the top diagnosis generated by ChatGPT-4 with vision was 44.4% (161/363), inferior to 55.9% (203/363) by ChatGPT-4 without vision (P=.002).

CONCLUSIONS

The rates of final diagnoses within the differential-diagnosis lists generated by ChatGPT-4 with vision were not improved compared to those without vision. The rate of final diagnoses as the top diagnosis generated by ChatGPT-4 with vision was inferior to that without vision. These results suggest that a multimodal generative AI system, ChatGPT-4 with vision, mainly relies on the text data, even though it accepts image data for generating differentials. Multimodal generative AI systems should be further developed to improve diagnostic performance through better integration of clinical data before being utilized in medicine.

CLINICALTRIAL

Not applicable

Publisher

JMIR Publications Inc.

Reference25 articles.

1. Diagnostic Excellence

2. Five strategies for clinicians to advance diagnostic excellence

3. An overview of clinical decision support systems: benefits, risks, and strategies for success

4. Improving clinical practice using clinical decision support systems: a systematic review of trials to identify features critical to success

5. Clinical Decision Support Systems for Diagnosis in Primary Care: A Scoping Review