Will code one day run a code? Performance of language models on <scp>ACEM</scp> primary examinations and implications-Reference-Cited by-同舟云学术

Will code one day run a code? Performance of language models on ACEM primary examinations and implications

Published:2023-07-06 Issue:5 Volume:35 Page:876-878
ISSN:1742-6731
Container-title:Emergency Medicine Australasia
language:en
Short-container-title:Emerg Medicine Australasia

Author:

Smith Jesse¹^ORCID,Choi Philip MC²³^ORCID,Buntine Paul¹²

Affiliation:

1. Eastern Health Emergency Medicine Program Eastern Health Melbourne Victoria Australia

2. Department of Neuroscience Eastern Health Melbourne Victoria Australia

3. Eastern Health Clinical School Monash University Melbourne Victoria Australia

Abstract

AbstractObjectiveLarge language models (LLMs) have demonstrated mixed results in their ability to pass various specialist medical examination and their performance within the field of emergency medicine remains unknown.MethodsWe explored the performance of three prevalent LLMs (OpenAI's GPT series, Google's Bard, and Microsoft's Bing Chat) on a practice ACEM primary examination.ResultsAll LLMs achieved a passing score, with scores with GPT 4.0 outperforming the average candidate.ConclusionLarge language models, by passing the ACEM primary examination, show potential as tools for medical education and practice. However, limitations exist and are discussed.

Publisher

Wiley

Subject

Emergency Medicine

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1111/1742-6723.14280

Reference5 articles.

1. Chat Generative Pretrained Transformer Fails the Multiple-Choice American College of Gastroenterology Self-Assessment Test

2. ChatGPT takes on the European Exam in Core Cardiology: an artificial intelligence success story?

3. Performance of ChatGPT on a Radiology Board-style Examination: Insights into Current Strengths and Limitations

4. Performance of ChatGPT, GPT-4, and Google Bard on a Neurosurgery Oral Boards Preparation Question Bank

5. OpenAI.GPT‐4 Technical Report.2023. [Cited 11 Jun 2023.] Available from URL:https://cdn.openai.com/papers/gpt-4.pdf

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Large language models in healthcare: from a systematic review on medical examinations to a comparative analysis on fundamentals of robotic surgery online test;Artificial Intelligence Review;2024-08-06

2. Einsatz von Künstlicher Intelligenz in der Notaufnahme;Notaufnahme up2date;2024-07

3. The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review;JMIR Medical Informatics;2024-05-10

4. Evaluating Accuracy and Reproducibility of Large Language Model Performance in Pharmacy Education;2024-03-24

5. Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs;npj Digital Medicine;2024-02-20