Assessing ChatGPT’s orthopedic in-service training exam performance and applicability in the field-Reference-Cited by-同舟云学术

Assessing ChatGPT’s orthopedic in-service training exam performance and applicability in the field

Published:2024-01-03 Issue:1 Volume:19 Page:
ISSN:1749-799X
Container-title:Journal of Orthopaedic Surgery and Research
language:en
Short-container-title:J Orthop Surg Res

Author:

Jain Neil^ORCID,Gottlich Caleb,Fisher John,Campano Dominic,Winston Travis

Abstract

Abstract Background ChatGPT has gained widespread attention for its ability to understand and provide human-like responses to inputs. However, few works have focused on its use in Orthopedics. This study assessed ChatGPT’s performance on the Orthopedic In-Service Training Exam (OITE) and evaluated its decision-making process to determine whether adoption as a resource in the field is practical. Methods ChatGPT’s performance on three OITE exams was evaluated through inputting multiple choice questions. Questions were classified by their orthopedic subject area. Yearly, OITE technical reports were used to gauge scores against resident physicians. ChatGPT’s rationales were compared with testmaker explanations using six different groups denoting answer accuracy and logic consistency. Variables were analyzed using contingency table construction and Chi-squared analyses. Results Of 635 questions, 360 were useable as inputs (56.7%). ChatGPT-3.5 scored 55.8%, 47.7%, and 54% for the years 2020, 2021, and 2022, respectively. Of 190 correct outputs, 179 provided a consistent logic (94.2%). Of 170 incorrect outputs, 133 provided an inconsistent logic (78.2%). Significant associations were found between test topic and correct answer (p = 0.011), and type of logic used and tested topic (p = < 0.001). Basic Science and Sports had adjusted residuals greater than 1.96. Basic Science and correct, no logic; Basic Science and incorrect, inconsistent logic; Sports and correct, no logic; and Sports and incorrect, inconsistent logic; had adjusted residuals greater than 1.96. Conclusions Based on annual OITE technical reports for resident physicians, ChatGPT-3.5 performed around the PGY-1 level. When answering correctly, it displayed congruent reasoning with testmakers. When answering incorrectly, it exhibited some understanding of the correct answer. It outperformed in Basic Science and Sports, likely due to its ability to output rote facts. These findings suggest that it lacks the fundamental capabilities to be a comprehensive tool in Orthopedic Surgery in its current form. Level of Evidence: II.

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1186/s13018-023-04467-0.pdf

Reference22 articles.

1. Shen Y, Heacock L, Elias J, Hentel KD, Reig B, Shih G, et al. ChatGPT and other large language models are double-edged swords. Radiology. 2023;307(2): e230163.

2. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. Adv Neural Inf Process Syst 2017:5998–6008.

3. Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, et al. How does ChatGPT perform on the United States medical licensing examination? The Implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9: e45312.

4. Rao A, Kim J, Kamineni M, Pang M, Lie W, Succi MD. Evaluating ChatGPT as an Adjunct for Radiologic Decision-Making. medRxiv. 2023.

5. Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2): e0000198.

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. ChatGPT-4 Surpasses Residents: A Study of Artificial Intelligence Competency in Plastic Surgery In-service Examinations and Its Advancements from ChatGPT-3.5;Plastic and Reconstructive Surgery - Global Open;2024-09

2. Performance of ChatGPT-3.5 and ChatGPT-4 on the European Board of Urology (EBU) exams: a comparative analysis;World Journal of Urology;2024-07-26

3. Transformative learning with ChatGPT: analyzing adoption trends and implications for business management students in India;Interactive Technology and Smart Education;2024-07-16

4. Assessing the Efficacy of an AI-Powered Chatbot (ChatGPT) in Providing Information on Orthopedic Surgeries: A Comparative Study With Expert Opinion;Cureus;2024-06-27

5. A quality and readability comparison of artificial intelligence and popular health website education materials for common hand surgery procedures;Hand Surgery and Rehabilitation;2024-06