Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison-Reference-Cited by-同舟云学术

Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison

Published:2024-03-31 Issue:1 Volume:21 Page:149-158
ISSN:2586-6583
Container-title:Neurospine
language:en
Short-container-title:Neurospine

Author:

Mejia Mateo Restrepo^ORCID,Arroyave Juan Sebastian^ORCID,Saturno Michael,Ndjonko Laura Chelsea Mazudie^ORCID,Zaidat Bashar^ORCID,Rajjoub Rami^ORCID,Ahmed Wasil^ORCID,Zapolsky Ivan,Cho Samuel K.^ORCID

Abstract

Objective: Large language models like chat generative pre-trained transformer (ChatGPT) have found success in various sectors, but their application in the medical field remains limited. This study aimed to assess the feasibility of using ChatGPT to provide accurate medical information to patients, specifically evaluating how well ChatGPT versions 3.5 and 4 aligned with the 2012 North American Spine Society (NASS) guidelines for lumbar disk herniation with radiculopathy.Methods: ChatGPT's responses to questions based on the NASS guidelines were analyzed for accuracy. Three new categories—overconclusiveness, supplementary information, and incompleteness—were introduced to deepen the analysis. Overconclusiveness referred to recommendations not mentioned in the NASS guidelines, supplementary information denoted additional relevant details, and incompleteness indicated omitted crucial information from the NASS guidelines.Results: Out of 29 clinical guidelines evaluated, ChatGPT-3.5 demonstrated accuracy in 15 responses (52%), while ChatGPT-4 achieved accuracy in 17 responses (59%). ChatGPT-3.5 was overconclusive in 14 responses (48%), while ChatGPT-4 exhibited overconclusiveness in 13 responses (45%). Additionally, ChatGPT-3.5 provided supplementary information in 24 responses (83%), and ChatGPT-4 provided supplemental information in 27 responses (93%). In terms of incompleteness, ChatGPT-3.5 displayed this in 11 responses (38%), while ChatGPT-4 showed incompleteness in 8 responses (23%).Conclusion: ChatGPT shows promise for clinical decision-making, but both patients and healthcare providers should exercise caution to ensure safety and quality of care. While these results are encouraging, further research is necessary to validate the use of large language models in clinical settings.

Publisher

The Korean Spinal Neurosurgery Society

Link

http://www.e-neurospine.org/upload/pdf/ns-2347052-526.pdf

Reference23 articles.

1. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models

2. Lumbosacral Radiculopathy

3. An evidence-based clinical guideline for the diagnosis and treatment of lumbar disc herniation with radiculopathy

4. Thromboembolic prophylaxis in spine surgery: an analysis of ChatGPT recommendations

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Assessing the accuracy and utility of ChatGPT responses to patient questions regarding posterior lumbar decompression;Artificial Intelligence Surgery;2024-09-04

2. Enhancing Diagnostic Support for Chiari Malformation and Syringomyelia: A Comparative Study of Contextualized ChatGPT Models;World Neurosurgery;2024-09

3. From the Editor-in-Chief: Featured Articles in the March 2024 Issue;Neurospine;2024-03-31

4. Accuracy of Different Generative Artificial Intelligence Models in Medical Question Answering: A Systematic Review and Network Meta-Analysis;2024