Affiliation:
1. Guy’s and St Thomas’ Hospital National Health Service Foundation Trust , London, SE1 9RT , United Kingdom
Abstract
Abstract
Purpose
Chat Generative Pre-trained Transformer (ChatGPT) is a large language artificial intelligence (AI) model which generates contextually relevant text in response to questioning. After ChatGPT successfully passed the United States Medical Licensing Examinations, proponents have argued it should play an increasing role in medical service provision and education. AI in healthcare remains in its infancy, and the reliability of AI systems must be scrutinized. This study assessed whether ChatGPT could pass Section 1 of the Fellowship of the Royal College of Surgeons (FRCS) examination in Trauma and Orthopaedic Surgery.
Methods
The UK and Ireland In-Training Examination (UKITE) was used as a surrogate for the FRCS. Papers 1 and 2 of UKITE 2022 were directly inputted into ChatGPT. All questions were in a single-best-answer format without wording alterations. Imaging was trialled to ensure ChatGPT utilized this information.
Results
ChatGPT scored 35.8%: 30% lower than the FRCS pass rate and 8.2% lower than the mean score achieved by human candidates of all training levels. Subspecialty analysis demonstrated ChatGPT scored highest in basic science (53.3%) and lowest in trauma (0%). In 87 questions answered incorrectly, ChatGPT only stated it did not know the answer once and gave incorrect explanatory answers for the remaining questions.
Conclusion
ChatGPT is currently unable to exert the higher-order judgement and multilogical thinking required to pass the FRCS examination. Further, the current model fails to recognize its own limitations. ChatGPT’s deficiencies should be publicized equally as much as its successes to ensure clinicians remain aware of its fallibility.
Key messages
What is already known on this topic
Following ChatGPT’s much-publicized success in passing the United States Medical Licensing Examinations, clinicians and medical students are using the model increasingly frequently for medical service provision and education. However ChatGPT remains in its infancy, and the model’s reliability and accuracy remain unproven.
What this study adds
This study demonstrates ChatGPT is currently unable to exert the higher-order judgement and multilogical thinking required to pass the Fellowship of the Royal College of Surgeons (FRCS) (Trauma & Orthopaedics) examination. Further, the current model fails to recognize its own limitations when offering both direct and explanatory answers.
How this study might affect research, practice, or policy
This study highlights the need for medical students and clinicians to exert caution when employing ChatGPT as a revision tool or applying it in clinical practice, and for patients to be aware of its fallibilities when using it as a health resource. Future research questions include:
Publisher
Oxford University Press (OUP)
Cited by
16 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献