Reliability of ChatGPT for performing triage task in the emergency department using the Korean Triage and Acuity Scale-Reference-Cited by-同舟云学术

Reliability of ChatGPT for performing triage task in the emergency department using the Korean Triage and Acuity Scale

Published:2024-01 Issue: Volume:10 Page:
ISSN:2055-2076
Container-title:DIGITAL HEALTH
language:en
Short-container-title:DIGITAL HEALTH

Author:

Kim Jae Hyuk¹,Kim Sun Kyung²³^ORCID,Choi Jongmyung⁴,Lee Youngho⁴

Affiliation:

1. Department of Emergency Medicine, Mokpo Hankook Hospital, Jeonnam, South Korea

2. Department of Nursing, Mokpo National University, Jeonnam, South Korea

3. Department of Biomedicine, Health & Life Convergence Sciences, Biomedical and Healthcare Research Institute, Jeonnam, South Korea

4. Department of Computer Engineering, Mokpo National University, Jeonnam, South Korea

Abstract

Background Artificial intelligence (AI) technology can enable more efficient decision-making in healthcare settings. There is a growing interest in improving the speed and accuracy of AI systems in providing responses for given tasks in healthcare settings. Objective This study aimed to assess the reliability of ChatGPT in determining emergency department (ED) triage accuracy using the Korean Triage and Acuity Scale (KTAS). Methods Two hundred and two virtual patient cases were built. The gold standard triage classification for each case was established by an experienced ED physician. Three other human raters (ED paramedics) were involved and rated the virtual cases individually. The virtual cases were also rated by two different versions of the chat generative pre-trained transformer (ChatGPT, 3.5 and 4.0). Inter-rater reliability was examined using Fleiss’ kappa and intra-class correlation coefficient (ICC). Results The kappa values for the agreement between the four human raters and ChatGPTs were .523 (version 4.0) and .320 (version 3.5). Of the five levels, the performance was poor when rating patients at levels 1 and 5, as well as case scenarios with additional text descriptions. There were differences in the accuracy of the different versions of GPTs. The ICC between version 3.5 and the gold standard was .520, and that between version 4.0 and the gold standard was .802. Conclusions A substantial level of inter-rater reliability was revealed when GPTs were used as KTAS raters. The current study showed the potential of using GPT in emergency healthcare settings. Considering the shortage of experienced manpower, this AI method may help improve triaging accuracy.

Funder

Ministry of the Interior and Safety

National Research Foundation of Korea

Publisher

SAGE Publications

Link

http://journals.sagepub.com/doi/pdf/10.1177/20552076241227132

Reference50 articles.

1. Out-of-hours care in western countries: assessment of different organizational models

2. Acuity Assessment in Obstetrical Triage

3. Validity of the Manchester Triage System in emergency care: A prospective observational study

4. Performance of triage systems in emergency care: a systematic review and meta-analysis

5. Validity of telephone and physical triage in emergency care: The Netherlands Triage System

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. User satisfaction with the service quality of ChatGPT;Service Business;2024-09-02

2. Decoding medical jargon: The use of AI language models (ChatGPT-4, BARD, microsoft copilot) in radiology reports;Patient Education and Counseling;2024-09

3. Large language model application in emergency medicine and critical care;Journal of the Formosan Medical Association;2024-08

4. Evaluating Large Language Models’ Ability Using a Psychiatric Screening Tool Based on Metaphor and Sarcasm Scenarios;Journal of Intelligence;2024-07-21

5. Emergency Patient Triage Improvement through a Retrieval-Augmented Generation Enhanced Large-Scale Language Model;Prehospital Emergency Care;2024-07-11