Affiliation:
1. Charles E. Schmidt College of Medicine at Florida Atlantic University, Boca Raton, Florida
2. Advent Health Orlando, Orlando, Florida
3. Neurotology Advent Health Celebration, Celebration, Florida
Abstract
Objective
Investigate the precision of language-model artificial intelligence (AI) in diagnosing conditions by contrasting its predictions with diagnoses made by board-certified otologic/neurotologic surgeons using patient-described symptoms.
Study Design
Prospective cohort study.
Setting
Tertiary care center.
Patients
One hundred adults participated in the study. These included new patients or established patients returning with new symptoms. Individuals were excluded if they could not provide a written description of their symptoms.
Interventions
Summaries of the patient's symptoms were supplied to three publicly available AI platforms: Chat GPT 4.0, Google Bard, and WebMD “Symptom Checker.”
Main Outcome Measures
This study evaluates the accuracy of three distinct AI platforms in diagnosing otologic conditions by comparing AI results with the diagnosis determined by a neurotologist with the same information provided to the AI platforms and again after a complete history and physical examination.
Results
The study includes 100 patients (52 men and 48 women; average age of 59.2 yr). Fleiss' kappa between AI and the physician is −0.103 (p < 0.01). The chi-squared test between AI and the physician is χ
2 = 12.95 (df = 2; p < 0.001). Fleiss' kappa between AI models is 0.409. Diagnostic accuracies are 22.45, 12.24, and 5.10% for ChatGPT 4.0, Google Bard, and WebMD, respectively.
Conclusions
Contemporary language-model AI platforms can generate extensive differential diagnoses with limited data input. However, doctors can refine these diagnoses through focused history-taking, physical examinations, and clinical experience—skills that current AI platforms lack.
Publisher
Ovid Technologies (Wolters Kluwer Health)