Feasibility of Multimodal Artificial Intelligence Using GPT-4 Vision for the Classification of Middle Ear Disease: Qualitative Study and Validation-Reference-Cited by-同舟云学术

Feasibility of Multimodal Artificial Intelligence Using GPT-4 Vision for the Classification of Middle Ear Disease: Qualitative Study and Validation

Published:2024-05-31 Issue: Volume:3 Page:e58342
ISSN:2817-1705
Container-title:JMIR AI
language:en
Short-container-title:JMIR AI

Author:

Noda Masao^ORCID,Yoshimura Hidekane^ORCID,Okubo Takuya^ORCID,Koshu Ryota^ORCID,Uchiyama Yuki^ORCID,Nomura Akihiro^ORCID,Ito Makoto^ORCID,Takumi Yutaka^ORCID

Abstract

Background The integration of artificial intelligence (AI), particularly deep learning models, has transformed the landscape of medical technology, especially in the field of diagnosis using imaging and physiological data. In otolaryngology, AI has shown promise in image classification for middle ear diseases. However, existing models often lack patient-specific data and clinical context, limiting their universal applicability. The emergence of GPT-4 Vision (GPT-4V) has enabled a multimodal diagnostic approach, integrating language processing with image analysis. Objective In this study, we investigated the effectiveness of GPT-4V in diagnosing middle ear diseases by integrating patient-specific data with otoscopic images of the tympanic membrane. Methods The design of this study was divided into two phases: (1) establishing a model with appropriate prompts and (2) validating the ability of the optimal prompt model to classify images. In total, 305 otoscopic images of 4 middle ear diseases (acute otitis media, middle ear cholesteatoma, chronic otitis media, and otitis media with effusion) were obtained from patients who visited Shinshu University or Jichi Medical University between April 2010 and December 2023. The optimized GPT-4V settings were established using prompts and patients’ data, and the model created with the optimal prompt was used to verify the diagnostic accuracy of GPT-4V on 190 images. To compare the diagnostic accuracy of GPT-4V with that of physicians, 30 clinicians completed a web-based questionnaire consisting of 190 images. Results The multimodal AI approach achieved an accuracy of 82.1%, which is superior to that of certified pediatricians at 70.6%, but trailing behind that of otolaryngologists at more than 95%. The model’s disease-specific accuracy rates were 89.2% for acute otitis media, 76.5% for chronic otitis media, 79.3% for middle ear cholesteatoma, and 85.7% for otitis media with effusion, which highlights the need for disease-specific optimization. Comparisons with physicians revealed promising results, suggesting the potential of GPT-4V to augment clinical decision-making. Conclusions Despite its advantages, challenges such as data privacy and ethical considerations must be addressed. Overall, this study underscores the potential of multimodal AI for enhancing diagnostic accuracy and improving patient care in otolaryngology. Further research is warranted to optimize and validate this approach in diverse clinical settings.

Publisher

JMIR Publications Inc.

Reference30 articles.

1. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning

2. Deep learning for whole-body medical image generation

3. A deep learning-based system capable of detecting pneumothorax via electrocardiogram

4. Automated multi-class classification for prediction of tympanic membrane changes with deep learning models

5. Deep Learning Techniques for Ear Diseases Based on Segmentation of the Normal Tympanic Membrane

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Middle ear-acquired cholesteatoma diagnosis based on CT scan image mining using supervised machine learning models;Beni-Suef University Journal of Basic and Applied Sciences;2024-08-15

2. Advancing Medical Education: Performance of Generative Artificial Intelligence Models on Otolaryngology Board Preparation Questions With Image Analysis Insights;Cureus;2024-07-09

3. Correction: Feasibility of Multimodal Artificial Intelligence Using GPT-4 Vision for the Classification of Middle Ear Disease: Qualitative Study and Validation (Preprint);2024-06-06