BACKGROUND
Since it was first launched, ChatGPT, a Large Language Model, has been widely used across different disciplines, particularly the medical field.
OBJECTIVE
The main aim of this review is to thoroughly assess the performance of ChatGPT in subspecialty written medical proficiency exams and comparing its distinct versions
METHODS
Distinct online databases were searched for appropriate articles that fit the intended objectives of the study: PubMed, CINAHL, and Web of Science. A group of reviewers was assembled in order to create an appropriate methodological framework for articles to be included.
RESULTS
16 articles were adopted for this review that assessed the performance of different ChatGPT versions across different subspecialty written examinations such as surgical subspecialties, and other subspecialties including neurology, orthopedics, trauma and orthopedics, core cardiology, family medicine, and dermatology. The studies reported a distinct accuracy rate, ranging from 35.8% to 91%, across different datasets and subspecialties.
CONCLUSIONS
This review indicates that ChatGPT can be used to enhance learning, provide customized feedback, and support medical students taking a range of medical subspecialty exams. However, to avoid exploitation and any detrimental effects on the real world of medicine, it is crucial to improve the ongoing evaluation of this AI tool.