Benchmarking Large Language Models for Cervical Spondylosis-Reference-Cited by-同舟云学术

Benchmarking Large Language Models for Cervical Spondylosis

Published:2024-08-05 Issue: Volume:8 Page:e55577
ISSN:2561-326X
Container-title:JMIR Formative Research
language:en
Short-container-title:JMIR Form Res

Author:

Zhang Boyan^ORCID,Du Yueqi^ORCID,Duan Wanru^ORCID,Chen Zan^ORCID

Abstract

Cervical spondylosis is the most common degenerative spinal disorder in modern societies. Patients require a great deal of medical knowledge, and large language models (LLMs) offer patients a novel and convenient tool for accessing medical advice. In this study, we collected the most frequently asked questions by patients with cervical spondylosis in clinical work and internet consultations. The accuracy of the answers provided by LLMs was evaluated and graded by 3 experienced spinal surgeons. Comparative analysis of responses showed that all LLMs could provide satisfactory results, and that among them, GPT-4 had the highest accuracy rate. Variation across each section in all LLMs revealed their ability boundaries and the development direction of artificial intelligence.

Publisher

JMIR Publications Inc.

Reference10 articles.

1. Epidemiology of cervical spondylotic myelopathy and its risk of causing spinal cord injury: a national cohort study

2. AI-Generated Medical Advice—GPT and Beyond

3. A Clinical Practice Guideline for the Management of Patients With Degenerative Cervical Myelopathy: Recommendations for Patients With Mild, Moderate, and Severe Disease and Nonmyelopathic Patients With Evidence of Cord Compression

4. A Clinical Practice Guideline for the Management of Degenerative Cervical Myelopathy: Introduction, Rationale, and Scope

5. Value of Surgery and Nonsurgical Approaches for Cervical Spondylotic Myelopathy: WFNS Spine Committee Recommendations