ChatGPT-4 Performance in Cardiac Arrest & Bradycardia Simulations Using American Heart Association (AHA) Advanced Cardiac Life Support (ACLS) Guidelines: An Exploratory Study (Preprint)-Reference-Cited by-同舟云学术

ChatGPT-4 Performance in Cardiac Arrest & Bradycardia Simulations Using American Heart Association (AHA) Advanced Cardiac Life Support (ACLS) Guidelines: An Exploratory Study (Preprint)

Published:2023-11-30 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Pham Cecilia,Govender Romi,Tehami Salik,Chavez Summer,Adepoju Lola,Liaw Winston

Abstract

BACKGROUND

ChatGPT-4 is the most advanced large language model (LLM) to date, with prior iterations passing medical licensing exams, providing clinical decision support and improved diagnostics. Although limited, past studies of ChatGPT’s performance found the AI could pass American Heart Association (AHA) Advanced Cardiac Life Support (ACLS) exams with modifications. ChatGPT-4’s accuracy has not been studied in more complex clinical scenarios. As heart disease and cardiac arrest remain leading causes of morbidity and mortality in the US, finding technologies that help increase adherence to ACLS algorithms, which improves survival outcomes is critical.

OBJECTIVE

Our study examines the accuracy of ChatGPT-4 in following ACLS guidelines for bradycardia and cardiac arrest.

METHODS

We evaluated the accuracy of ChatGPT-4's responses to two simulations based on the 2020 AHA ACLS guidelines with three primary outcomes of interest: the mean individual step accuracy, the accuracy score per simulation attempt, and the accuracy score for each algorithm. For each simulation step, ChatGPT was scored for correctness (1 point) or incorrectness (0 points). Each simulation was conducted 20 times

RESULTS

ChatGPT’s average accuracy for cardiac arrest was 69% (IQR: 7; 67% - 75%) and for bradycardia was 43% (IQR: 17; 33% - 50%). We found ChatGPT’s outputs varied despite consistent input, the same actions were persistently missed, repetitive overemphasis hindered guidance, and erroneous medication information was presented.

CONCLUSIONS

This study highlights the need for consistent and reliable guidance to prevent potential medical errors and optimize the application of ChatGPT to enhance its reliability and effectiveness in clinical practice.

Publisher

JMIR Publications Inc.

Reference25 articles.

1. Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments

2. Deep learning

3. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models

4. ChatGPT for healthcare services: An emerging stage for an innovative perspective

5. Using AI-generated suggestions from ChatGPT to optimize clinical decision support