Performance of large language models at the MRCS Part A: a tool for medical education?

Author:

Yiu A12,Lam K2

Affiliation:

1. King’s College Hospital NHS Foundation Trust, UK

2. Imperial College London, UK

Abstract

Introduction The Intercollegiate Membership of the Royal College of Surgeons examination (MRCS) Part A assesses generic surgical sciences and applied knowledge using 300 multiple-choice Single Best Answer items. Large Language Models (LLMs) are trained on vast amounts of text to generate natural language outputs, and applications in healthcare and medical education are rising. Methods Two LLMs, ChatGPT (OpenAI) and Bard (Google AI), were tested using 300 questions from a popular MRCS Part A question bank without/with need for justification (NJ/J). LLM outputs were scored according to accuracy, concordance and insight. Results ChatGPT achieved 85.7%/84.3% accuracy for NJ/J encodings. Bard achieved 64%/64.3% accuracy for NJ/J encodings. ChatGPT and Bard displayed high levels of concordance for NJ (95.3%; 81.7%) and J (93.7%; 79.7%) encodings, respectively. ChatGPT and Bard provided an insightful statement in >98% and >86% outputs, respectively. Discussion This study demonstrates that ChatGPT achieves passing-level accuracy at MRCS Part A, and both LLMs achieve high concordance and provide insightful responses to test questions. Instances of clinically inappropriate or inaccurate decision-making, incomplete appreciation of nuanced clinical scenarios and utilisation of out-of-date guidance was, however, noted. LLMs are accessible and time-efficient tools, access vast clinical knowledge, and may reduce the emphasis on factual recall in medical education and assessment. Conclusion ChatGPT achieves passing-level accuracy for MRCS Part A with concordant and insightful outputs. Future applications of LLMs in healthcare must be cautious of hallucinations and incorrect reasoning but have the potential to develop AI-supported clinicians.

Publisher

Royal College of Surgeons of England

Subject

General Medicine,Surgery

Reference18 articles.

1. Does the MRCS exam fulfil its purpose in surgical professions?

2. MRCS & DO-HNS Examinations. ICBSE Annual Report 2021/2022. JCST. Available from: https://www.intercollegiatemrcsexams.org.uk/-/media/Files/IMRCS/Policies-and-reports/ICBSE-Annual-Report-202122-FINAL.pdf (cited 2023 Jul 18).

3. Differential attainment at MRCS according to gender, ethnicity, age and socioeconomic factors: a retrospective cohort study

4. Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3