Performance of a Large Language Model in the Generation of Clinical Guidelines for Antibiotic Prophylaxis in Spine Surgery-Reference-Cited by-同舟云学术

Performance of a Large Language Model in the Generation of Clinical Guidelines for Antibiotic Prophylaxis in Spine Surgery

Published:2024-03-31 Issue:1 Volume:21 Page:128-146
ISSN:2586-6583
Container-title:Neurospine
language:en
Short-container-title:Neurospine

Author:

Zaidat Bashar^ORCID,Shrestha Nancy^ORCID,Rosenberg Ashley M.^ORCID,Ahmed Wasil^ORCID,Rajjoub Rami^ORCID,Hoang Timothy^ORCID,Mejia Mateo Restrepo^ORCID,Duey Akiro H.^ORCID,Tang Justin E.^ORCID,Kim Jun S.^ORCID,Cho Samuel K.^ORCID

Abstract

Objective: Large language models, such as chat generative pre-trained transformer (ChatGPT), have great potential for streamlining medical processes and assisting physicians in clinical decision-making. This study aimed to assess the potential of ChatGPT’s 2 models (GPT-3.5 and GPT-4.0) to support clinical decision-making by comparing its responses for antibiotic prophylaxis in spine surgery to accepted clinical guidelines.Methods: ChatGPT models were prompted with questions from the North American Spine Society (NASS) Evidence-based Clinical Guidelines for Multidisciplinary Spine Care for Antibiotic Prophylaxis in Spine Surgery (2013). Its responses were then compared and assessed for accuracy.Results: Of the 16 NASS guideline questions concerning antibiotic prophylaxis, 10 responses (62.5%) were accurate in ChatGPT’s GPT-3.5 model and 13 (81%) were accurate in GPT-4.0. Twenty-five percent of GPT-3.5 answers were deemed as overly confident while 62.5% of GPT-4.0 answers directly used the NASS guideline as evidence for its response.Conclusion: ChatGPT demonstrated an impressive ability to accurately answer clinical questions. GPT-3.5 model’s performance was limited by its tendency to give overly confident responses and its inability to identify the most significant elements in its responses. GPT-4.0 model’s responses had higher accuracy and cited the NASS guideline as direct evidence many times. While GPT-4.0 is still far from perfect, it has shown an exceptional ability to extract the most relevant research available compared to GPT-3.5. Thus, while ChatGPT has shown far-reaching potential, scrutiny should still be exercised regarding its clinical use at this time.

Publisher

The Korean Spinal Neurosurgery Society

Link

http://www.e-neurospine.org/upload/pdf/ns-2347310-655.pdf

Reference40 articles.

1. Can a Novel Natural Language Processing Model and Artificial Intelligence Automatically Generate Billing Codes From Spine Surgical Operative Notes?

2. Can Natural Language Processing and Artificial Intelligence Automate The Generation of Billing Codes From Operative Note Dictations?

3. Validation of Prediction Models for Critical Care Outcomes Using Natural Language Processing of Electronic Health Record Data

4. Prediction of Emergency Department Hospital Admission Based on Natural Language Processing and Neural Networks

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhancing Diagnostic Support for Chiari Malformation and Syringomyelia: A Comparative Study of Contextualized ChatGPT Models;World Neurosurgery;2024-09

2. Artificial intelligence and prescription of antibiotic therapy: present and future;Expert Review of Anti-infective Therapy;2024-08-18

3. Toward Clinical Generative AI: Conceptual Framework;JMIR AI;2024-06-07

4. Commentary on “Performance of a Large Language Model in the Generation of Clinical Guidelines for Antibiotic Prophylaxis in Spine Surgery”;Neurospine;2024-03-31