Performance of Novel GPT-4 in Otolaryngology Knowledge Assessment-Reference-Cited by-同舟云学术

Performance of Novel GPT-4 in Otolaryngology Knowledge Assessment

Published:2024-08-03 Issue: Volume: Page:
ISSN:2231-3796
Container-title:Indian Journal of Otolaryngology and Head & Neck Surgery
language:en
Short-container-title:Indian J Otolaryngol Head Neck Surg

Author:

Revercomb Lucy^ORCID,Patel Aman M.,Fu Daniel,Filimonov Andrey

Abstract

Abstract Purpose GPT-4, recently released by OpenAI, improves upon GPT-3.5 with increased reliability and expanded capabilities, including user-specified, customizable GPT-4 models. This study aims to investigate updates in GPT-4 performance vs. GPT-3.5 on Otolaryngology board-style questions. Methods 150 Otolaryngology board-style questions were obtained from the BoardVitals question bank. These questions, which were previously assessed with GPT-3.5, were inputted into standard GPT-4 and a custom GPT-4 model designed to specialize in Otolaryngology board-style questions, emphasize precision, and provide evidence-based explanations. Results Standard GPT-4 correctly answered 72.0% and custom GPT-4 correctly answered 81.3% of the questions, vs. GPT-3.5 which answered 51.3% of the same questions correctly. On multivariable analysis, custom GPT-4 had higher odds of correctly answering questions than standard GPT-4 (adjusted odds ratio 2.19, P = 0.015). Both GPT-4 and custom GPT-4 demonstrated a decrease in performance between questions rated as easy and hard (P < 0.001). Conclusions Our study suggests that GPT-4 has higher accuracy than GPT-3.5 in answering Otolaryngology board-style questions. Our custom GPT-4 model demonstrated higher accuracy than standard GPT-4, potentially as a result of its instructions to specialize in Otolaryngology board-style questions, select exactly one answer, and emphasize precision. This demonstrates custom models may further enhance utilization of ChatGPT in medical education.

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s12070-024-04935-x.pdf

Reference7 articles.

1. Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, Madriaga M, Aggabao R, Diaz-Candido G, Maningo J, Tseng V (2023) Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health 2:e0000198. https://doi.org/10.1371/journal.pdig.0000198

2. Revercomb L, Patel AM, Choudhry HS, Filimonov A (2023) Performance of ChatGPT in Otolaryngology knowledge assessment. Am J Otolaryngol 45:104082. https://doi.org/10.1016/j.amjoto.2023.104082

3. Mahajan AP, Shabet CL, Smith J, Rudy SF, Kupfer RA, Bohm LA (2023) Assessment of Artificial Intelligence Performance on the Otolaryngology Residency In-Service Exam. OTO Open 7:e98. https://doi.org/10.1002/oto2.98

4. Gupta R, Park JB, Herzog I, Yosufi N, Mangan A, Firouzbakht PK, Mailey BA (2023) Applying GPT-4 to the Plastic Surgery Inservice Training Examination. J Plast Reconstr Aesthet Surg 87:78–82. https://doi.org/10.1016/j.bjps.2023.09.027

5. Brin D, Sorin V, Vaid A, Soroush A, Glicksberg BS, Charney AW, Nadkarni G, Klang E (2023) Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments. Sci Rep 13:16492. https://doi.org/10.1038/s41598-023-43436-9