Are Generative Pretrained Transformer 4 Responses to Developmental Dysplasia of the Hip Clinical Scenarios Universal? An International Review

Author:

Luo Shaoting1,Canavese Federico2,Aroojis Alaric2,Andreacchio Antonio2,Anticevic Darko3,Bouchard Maryse4,Castaneda Pablo2,De Rosa Vincenzo3,Fiogbe Michel Armand4,Frick Steven L.2,Hui James H.5,Johari Ashok N.3,Loro Antonio4,Lyu Xuemin2,Matsushita Masaki5,Omeroglu Hakan4,Roye David P.5,Shah Maulin M.4,Yong Bicheng6,Li Lianyong1

Affiliation:

1. Department of Pediatric Orthopaedics, Shengjing Hospital of China Medical University, Shenyang, Liaoning

2. Department of Orthopaedic Surgery, School of Medicine, Stanford University, Palo Alto, CA

3. Pediatric Orthopedics Clinic of Pediatric Surgery and Orthopedics, Pediatric Institute of Southern Switzerland (IPSI), Via Athos Gallino, Bellinzona, Switzerland

4. Ufuk University Faculty of Medicine, Ankara, Turkey

5. Department of Orthopaedic Surgery, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan

6. Department of Pediatric Orthopaedics, Beit CURE Children’s Hospital of Malawi, Chichiri Blantyre, Malawi

Abstract

Objective: There is increasing interest in applying artificial intelligence chatbots like generative pretrained transformer 4 (GPT-4) in the medical field. This study aimed to explore the universality of GPT-4 responses to simulated clinical scenarios of developmental dysplasia of the hip (DDH) across diverse global settings. Methods: Seventeen international experts with more than 15 years of experience in pediatric orthopaedics were selected for the evaluation panel. Eight simulated DDH clinical scenarios were created, covering 4 key areas: (1) initial evaluation and diagnosis, (2) initial examination and treatment, (3) nursing care and follow-up, and (4) prognosis and rehabilitation planning. Each scenario was completed independently in a new GPT-4 session. Interrater reliability was assessed using Fleiss kappa, and the quality, relevance, and applicability of GPT-4 responses were analyzed using median scores and interquartile ranges. Following scoring, experts met in ZOOM sessions to generate Regional Consensus Assessment Scores, which were intended to represent a consistent regional assessment of the use of the GPT-4 in pediatric orthopaedic care. Results: GPT-4’s responses to the 8 clinical DDH scenarios received performance scores ranging from 44.3% to 98.9% of the 88-point maximum. The Fleiss kappa statistic of 0.113 (P = 0.001) indicated low agreement among experts in their ratings. When assessing the responses’ quality, relevance, and applicability, the median scores were 3, with interquartile ranges of 3 to 4, 3 to 4, and 2 to 3, respectively. Significant differences were noted in the prognosis and rehabilitation domain scores (P < 0.05 for all). Regional consensus scores were 75 for Africa, 74 for Asia, 73 for India, 80 for Europe, and 65 for North America, with the Kruskal-Wallis test highlighting significant disparities between these regions (P = 0.034). Conclusions: This study demonstrates the promise of GPT-4 in pediatric orthopaedic care, particularly in supporting preliminary DDH assessments and guiding treatment strategies for specialist care. However, effective integration of GPT-4 into clinical practice will require adaptation to specific regional health care contexts, highlighting the importance of a nuanced approach to health technology adaptation. Level of Evidence: Level IV.

Publisher

Ovid Technologies (Wolters Kluwer Health)

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3