Affiliation:
1. Keşan State Hospital
2. Bakırköy Dr. Sadi Konuk Training and Research Hospital
3. Kocaeli University
4. Curistica Ltd
Abstract
Objective: Being publicly available, easy to use, and continuously evolving, next-generation chatbots have the potential to be used in triage, one of the most critical functions of an Emergency Department. The aim of this study was to assess the performance of Generative Pre-trained Transformer 4 (GPT-4), Bard and Claude during decision-making for Emergency Department triage.
Material and Methods: This was a preliminary cross-sectional study conducted with 50 case scenarios. Emergency Medicine specialists determined the reference Emergency Severity Index triage category of each scenario. Subsequently, each case scenario was queried using three chatbots. Inconsistent classifications between the chatbots and references were defined as over-triage (false positive) or under-triage (false negative). The primary and secondary outcomes were the predictive performance of chatbots and the difference between them in predicting high acuity triage.
Results: F1 Scores for GPT-4, Bard, and Claude for predicting Emergency Severity Index 1 and 2 were 0.899, 0.791, and 0.865 respectively. The ROC Curve of GPT-4 for high acuity predictions showed an area under the curve (AUC) of 0.911 (95% CI: 0,814-1; p
Publisher
Kirikkale Universitesi Tıp Fakultesi Dergisi
Reference25 articles.
1. Lee P, Bubeck S, Petro J. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. New England Journal of Medicine. 2023;388(13):1233-9.
2. OpenAI. GPT-4 technical report. ArXiv. Accessed date: September 29, 2023: https://arxiv.org/abs/2303.08774.
3. Katz DM, Bommarito MJ, Gao S, Arredondo P. GPT-4 passes the bar exam. SSRN Electronic Journal. Published online 2023.
4. Google. Bard FAQ. Accessed date: April 21, 2023: https://bard.google.com/faq?hl=en
5. Anthropic. Introducing Claude. Accessed date: April 21, 2023:https://www.anthropic.com/index/introducing- claude