Performance of ChatGPT Compared to Clinical Practice Guidelines in Making Informed Decisions for Lumbosacral Radicular Pain: A Cross-sectional Study-Reference-Cited by-同舟云学术

Performance of ChatGPT Compared to Clinical Practice Guidelines in Making Informed Decisions for Lumbosacral Radicular Pain: A Cross-sectional Study

Published:2024-03 Issue:3 Volume:54 Page:222-228
ISSN:0190-6011
Container-title:Journal of Orthopaedic & Sports Physical Therapy
language:en
Short-container-title:Journal of Orthopaedic & Sports Physical Therapy

Author:

Gianola Silvia,Bargeri Silvia,Castellini Greta,Cook Chad,Palese Alvisa,Pillastrini Paolo,Salvalaggio Silvia,Turolla Andrea,Rossettini Giacomo

Abstract

OBJECTIVE: To compare the accuracy of an artificial intelligence chatbot to clinical practice guidelines (CPGs) recommendations for providing answers to complex clinical questions on lumbosacral radicular pain. DESIGN: Cross-sectional study. METHODS: We extracted recommendations from recent CPGs for diagnosing and treating lumbosacral radicular pain. Relative clinical questions were developed and queried to OpenAI’s ChatGPT (GPT-3.5). We compared ChatGPT answers to CPGs recommendations by assessing the (1) internal consistency of ChatGPT answers by measuring the percentage of text wording similarity when a clinical question was posed 3 times, (2) reliability between 2 independent reviewers in grading ChatGPT answers, and (3) accuracy of ChatGPT answers compared to CPGs recommendations. Reliability was estimated using Fleiss’ kappa (κ) coefficients, and accuracy by interobserver agreement as the frequency of the agreements among all judgments. RESULTS: We tested 9 clinical questions. The internal consistency of text ChatGPT answers was unacceptable across all 3 trials in all clinical questions (mean percentage of 49%, standard deviation of 15). Intrareliability (reviewer 1: κ = 0.90, standard error [SE] = 0.09; reviewer 2: κ = 0.90, SE = 0.10) and interreliability (κ = 0.85, SE = 0.15) between the 2 reviewers was “almost perfect.” Accuracy between ChatGPT answers and CPGs recommendations was slight, demonstrating agreement in 33% of recommendations. CONCLUSION: ChatGPT performed poorly in internal consistency and accuracy of the indications generated compared to clinical practice guideline recommendations for lumbosacral radicular pain. J Orthop Sports Phys Ther 2024;54(3):222-228. Epub 29 January 2024. doi:10.2519/jospt.2024.12151

Publisher

Journal of Orthopaedic & Sports Physical Therapy (JOSPT)

Link

https://www.jospt.org/doi/pdf/10.2519/jospt.2024.12151

Reference44 articles.

1. Appropriateness of Recommendations Provided by ChatGPT to Interventional Radiologists

2. Can ChatGPT Accurately Answer a PICOT Question? Assessing AI Response to a Clinical Question

3. The AGREE Reporting Checklist: a tool to improve reporting of clinical practice guidelines

4. Accuracy of Information Provided by ChatGPT Regarding Liver Cancer Surveillance and Diagnosis

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Exploring ChatGPT’s potential in the clinical stream of neurorehabilitation;Frontiers in Artificial Intelligence;2024-06-06

2. GastroBot: a Chinese gastrointestinal disease chatbot based on the retrieval-augmented generation;Frontiers in Medicine;2024-05-22

3. Use of large language model-based chatbots in managing the rehabilitation concerns and education needs of outpatient stroke survivors and caregivers;Frontiers in Digital Health;2024-05-09