Abstract
Abstract
Background
The spread of artificial intelligence (AI) has led to transformative advancements in diverse sectors, including healthcare. Specifically, generative writing systems have shown potential in various applications, but their effectiveness in clinical settings has been barely investigated. In this context, we evaluated the proficiency of ChatGPT-4 in diagnosing gonarthrosis and coxarthrosis and recommending appropriate treatments compared with orthopaedic specialists.
Methods
A retrospective review was conducted using anonymized medical records of 100 patients previously diagnosed with either knee or hip arthrosis. ChatGPT-4 was employed to analyse these historical records, formulating both a diagnosis and potential treatment suggestions. Subsequently, a comparative analysis was conducted to assess the concordance between the AI’s conclusions and the original clinical decisions made by the physicians.
Results
In diagnostic evaluations, ChatGPT-4 consistently aligned with the conclusions previously drawn by physicians. In terms of treatment recommendations, there was an 83% agreement between the AI and orthopaedic specialists. The therapeutic concordance was verified by the calculation of a Cohen’s Kappa coefficient of 0.580 (p < 0.001). This indicates a moderate-to-good level of agreement. In recommendations pertaining to surgical treatment, the AI demonstrated a sensitivity and specificity of 78% and 80%, respectively. Multivariable logistic regression demonstrated that the variables reduced quality of life (OR 49.97, p < 0.001) and start-up pain (OR 12.54, p = 0.028) have an influence on ChatGPT-4’s recommendation for a surgery.
Conclusion
This study emphasises ChatGPT-4’s notable potential in diagnosing conditions such as gonarthrosis and coxarthrosis and in aligning its treatment recommendations with those of orthopaedic specialists. However, it is crucial to acknowledge that AI tools such as ChatGPT-4 are not meant to replace the nuanced expertise and clinical judgment of seasoned orthopaedic surgeons, particularly in complex decision-making scenarios regarding treatment indications. Due to the exploratory nature of the study, further research with larger patient populations and more complex diagnoses is necessary to validate the findings and explore the broader potential of AI in healthcare.
Level of Evidence: Level III evidence.
Publisher
Springer Science and Business Media LLC
Subject
Orthopedics and Sports Medicine,Surgery
Reference24 articles.
1. OpenAI (2023) PREPRINT GPT-4 Technical report (arXiv:2303.08774). arXiv. https://doi.org/10.48550/arXiv.2303.08774
2. Eloundou T, Manning S, Mishkin P, Rock D (2023) PREPRINT GPTs are GPTs: an early look at the labor market impact potential of large language models (arXiv:2303.10130). arXiv. http://arxiv.org/abs/2303.10130
3. Biswas SS (2023) Role of chat GPT in public health. Ann Biomed Eng 51(5):868–869. https://doi.org/10.1007/s10439-023-03172-7
4. Sezgin E, Sirrianni J, Linwood SL (2022) Operationalizing and implementing pretrained, large artificial intelligence linguistic models in the US health care system: outlook of generative pretrained transformer 3 (GPT-3) as a service model. JMIR Med Inform 10(2):e32875. https://doi.org/10.2196/32875
5. Cheng K, Wu C, Gu S, Lu Y, Wu H, Li C (2023) WHO declares end of COVID-19 global health emergency: lessons and recommendations from the perspective of ChatGPT/GPT-4. Int J Surg (London, England). https://doi.org/10.1097/JS9.0000000000000521
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献