Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful?-Reference-Cited by-同舟云学术

Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful?

Published:2023-10-20 Issue:20 Volume:12 Page:6655
ISSN:2077-0383
Container-title:Journal of Clinical Medicine
language:en
Short-container-title:JCM

Author:

Draschl Alexander¹²^ORCID,Hauer Georg¹,Fischerauer Stefan Franz¹,Kogler Angelika¹³,Leitner Lukas¹^ORCID,Andreou Dimosthenis¹^ORCID,Leithner Andreas¹^ORCID,Sadoghi Patrick¹^ORCID

Affiliation:

1. Department of Orthopedics and Trauma, Medical University of Graz, Auenbruggerplatz 5, 8036 Graz, Austria

2. Division of Plastic, Aesthetic and Reconstructive Surgery, Department of Surgery, Medical University of Graz, Auenbruggerplatz 29/4, 8036 Graz, Austria

3. Department of Dermatology and Venereology, Medical University of Graz, Auenbruggerplatz 8, 8036 Graz, Austria

Abstract

Background: This study aimed to evaluate ChatGPT’s performance on questions about periprosthetic joint infections (PJI) of the hip and knee. Methods: Twenty-seven questions from the 2018 International Consensus Meeting on Musculoskeletal Infection were selected for response generation. The free-text responses were evaluated by three orthopedic surgeons using a five-point Likert scale. Inter-rater reliability (IRR) was assessed via Fleiss’ kappa (FK). Results: Overall, near-perfect IRR was found for disagreement on the presence of factual errors (FK: 0.880, 95% CI [0.724, 1.035], p < 0.001) and agreement on information completeness (FK: 0.848, 95% CI [0.699, 0.996], p < 0.001). Substantial IRR was observed for disagreement on misleading information (FK: 0.743, 95% CI [0.601, 0.886], p < 0.001) and agreement on suitability for patients (FK: 0.627, 95% CI [0.478, 0.776], p < 0.001). Moderate IRR was observed for agreement on “up-to-dateness” (FK: 0.584, 95% CI [0.434, 0.734], p < 0.001) and suitability for orthopedic surgeons (FK: 0.505, 95% CI [0.383, 0.628], p < 0.001). Question- and subtopic-specific analysis revealed diverse IRR levels ranging from near-perfect to poor. Conclusions: ChatGPT’s free-text responses to complex orthopedic questions were predominantly reliable and useful for orthopedic surgeons and patients. Given variations in performance by question and subtopic, consulting additional sources and exercising careful interpretation should be emphasized for reliable medical decision-making.

Publisher

MDPI AG

Subject

General Medicine

Link

https://www.mdpi.com/2077-0383/12/20/6655/pdf

Reference22 articles.

1. A Guide to Deep Learning in Healthcare;Esteva;Nat. Med.,2019

2. Artificial Intelligence in Dental Research: Checklist for Authors, Reviewers, Readers;Schwendicke;J. Dent.,2021

3. “Dr ChatGPT”: Is It a Reliable and Useful Source for Common Rheumatic Diseases?;Uz;Int. J. Rheum. Dis.,2023

4. Lo, C.K. (2023). What Is the Impact of ChatGPT on Education? A Rapid Review of the Literature. Educ. Sci., 13.

5. ChatGPT and the Clinical Informatics Board Examination: The End of Unproctored Maintenance of Certification?;Mankowitz;J. Am. Med. Inform. Assoc.,2023

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. GPT-based chatbot tools are still unreliable in the management of prosthetic joint infections;MUSCULOSKELETAL SURGERY;2024-07-02

2. Large Language Models take on the AAMC Situational Judgment Test: Evaluating Dilemma-Based Scenarios;2024-07-01

3. ChatGPT-4 Knows Its A B C D E but Cannot Cite Its Source;JBJS Open Access;2024-07

4. Systematic Review of Large Language Models for Patient Care: Current Applications and Challenges;2024-03-05

5. ChatGPT May Offer an Adequate Substitute for Informed Consent to Patients Prior to Total Knee Arthroplasty—Yet Caution Is Needed;Journal of Personalized Medicine;2024-01-05