Abstract
Objectives
As a large language model (LLM) trained on a large data set, ChatGPT can perform a wide array of tasks without additional training. We evaluated the performance of ChatGPT on postgraduate UK medical examinations through a systematic literature review of ChatGPT’s performance in UK postgraduate medical assessments and its performance on Member of Royal College of Physicians (MRCP) Part 1 examination.
Methods
Medline, Embase and Cochrane databases were searched. Articles discussing the performance of ChatGPT in UK postgraduate medical examinations were included in the systematic review. Information was extracted on exam performance including percentage scores and pass/fail rates.
MRCP UK Part 1 sample paper questions were inserted into ChatGPT-3.5 and -4 four times each and the scores marked against the correct answers provided.
Results
12 studies were ultimately included in the systematic literature review.
ChatGPT-3.5 scored 66.4% and ChatGPT-4 scored 84.8% on MRCP Part 1 sample paper, which is 4.4% and 22.8% above the historical pass mark respectively. Both ChatGPT-3.5 and -4 performance was significantly above the historical pass mark for MRCP Part 1, indicating they would likely pass this examination.
ChatGPT-3.5 failed eight out of nine postgraduate exams it performed with an average percentage of 5.0% below the pass mark.
ChatGPT-4 passed nine out of eleven postgraduate exams it performed with an average percentage of 13.56% above the pass mark. ChatGPT-4 performance was significantly better than ChatGPT-3.5 in all examinations that both models were tested on.
Conclusion
ChatGPT-4 performed at above passing level for the majority of UK postgraduate medical examinations it was tested on. ChatGPT is prone to hallucinations, fabrications and reduced explanation accuracy which could limit its potential as a learning tool. The potential for these errors is an inherent part of LLMs and may always be a limitation for medical applications of ChatGPT.
Publisher
Public Library of Science (PLoS)
Reference58 articles.
1. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models;TH Kung;PLOS Digit Health,2023
2. Artificial intelligence in medical imaging;JC Gore;Magnetic Resonance Imaging,2020
3. How to develop machine learning models for healthcare;P-HC Chen;Nat Mater,2019
4. ChatGPT. ChatGPT. [cited 20 Mar 2023]. Available: https://chat.openai.com.
5. The rise of ChatGPT: Exploring its potential in medical education;H. Lee;Anat Sci Educ,2023
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献