The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study-Reference-Cited by-同舟云学术

The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study

Published:2023-09-15 Issue: Volume:25 Page:e47621
ISSN:1438-8871
Container-title:Journal of Medical Internet Research
language:en
Short-container-title:J Med Internet Res

Author:

Kuroiwa Tomoyuki^ORCID,Sarcon Aida^ORCID,Ibara Takuya^ORCID,Yamada Eriku^ORCID,Yamamoto Akiko^ORCID,Tsukamoto Kazuya^ORCID,Fujita Koji^ORCID

Abstract

Background Artificial intelligence (AI) has gained tremendous popularity recently, especially the use of natural language processing (NLP). ChatGPT is a state-of-the-art chatbot capable of creating natural conversations using NLP. The use of AI in medicine can have a tremendous impact on health care delivery. Although some studies have evaluated ChatGPT’s accuracy in self-diagnosis, there is no research regarding its precision and the degree to which it recommends medical consultations. Objective The aim of this study was to evaluate ChatGPT’s ability to accurately and precisely self-diagnose common orthopedic diseases, as well as the degree of recommendation it provides for medical consultations. Methods Over a 5-day course, each of the study authors submitted the same questions to ChatGPT. The conditions evaluated were carpal tunnel syndrome (CTS), cervical myelopathy (CM), lumbar spinal stenosis (LSS), knee osteoarthritis (KOA), and hip osteoarthritis (HOA). Answers were categorized as either correct, partially correct, incorrect, or a differential diagnosis. The percentage of correct answers and reproducibility were calculated. The reproducibility between days and raters were calculated using the Fleiss κ coefficient. Answers that recommended that the patient seek medical attention were recategorized according to the strength of the recommendation as defined by the study. Results The ratios of correct answers were 25/25, 1/25, 24/25, 16/25, and 17/25 for CTS, CM, LSS, KOA, and HOA, respectively. The ratios of incorrect answers were 23/25 for CM and 0/25 for all other conditions. The reproducibility between days was 1.0, 0.15, 0.7, 0.6, and 0.6 for CTS, CM, LSS, KOA, and HOA, respectively. The reproducibility between raters was 1.0, 0.1, 0.64, –0.12, and 0.04 for CTS, CM, LSS, KOA, and HOA, respectively. Among the answers recommending medical attention, the phrases “essential,” “recommended,” “best,” and “important” were used. Specifically, “essential” occurred in 4 out of 125, “recommended” in 12 out of 125, “best” in 6 out of 125, and “important” in 94 out of 125 answers. Additionally, 7 out of the 125 answers did not include a recommendation to seek medical attention. Conclusions The accuracy and reproducibility of ChatGPT to self-diagnose five common orthopedic conditions were inconsistent. The accuracy could potentially be improved by adding symptoms that could easily identify a specific location. Only a few answers were accompanied by a strong recommendation to seek medical attention according to our study standards. Although ChatGPT could serve as a potential first step in accessing care, we found variability in accurate self-diagnosis. Given the risk of harm with self-diagnosis without medical follow-up, it would be prudent for an NLP to include clear language alerting patients to seek expert medical opinions. We hope to shed further light on the use of AI in a future clinical study.

Publisher

JMIR Publications Inc.

Subject

Health Informatics

Reference40 articles.

1. The Use of Digital Health by South Asian Communities: Scoping Review

2. Artificial Intelligence Applications for Assessment, Monitoring, and Management of Parkinson Disease Symptoms: Protocol for a Systematic Review

3. Investigating the Impact of User Trust on the Adoption and Use of ChatGPT: Survey Analysis

4. Considering Clinician Competencies for the Implementation of Artificial Intelligence–Based Tools in Health Care: Findings From a Scoping Review

5. Application of Artificial Intelligence to the Monitoring of Medication Adherence for Tuberculosis Treatment in Africa: Algorithm Development and Validation

Cited by 33 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Beyond the Scalpel: Assessing ChatGPT's potential as an auxiliary intelligent virtual assistant in oral surgery;Computational and Structural Biotechnology Journal;2024-12

2. Large Language Models in Biomedical and Health Informatics: A Review with Bibliometric Analysis;Journal of Healthcare Informatics Research;2024-09-14

3. The Diagnostic Ability of GPT-3.5 and GPT-4.0 in Surgery: Comparative Analysis;Journal of Medical Internet Research;2024-09-10

4. Quality and Accountability of ChatGPT in Health Care in Low- and Middle-Income Countries: Simulated Patient Study;Journal of Medical Internet Research;2024-09-09

5. Editorial Commentary: The Scope of Medical Research Concerning ChatGPT Remains Limited by Lack of Originality and Leaves More Questions Than Answers;Arthroscopy: The Journal of Arthroscopic & Related Surgery;2024-09