Comparative Analysis of Responses to Online Autistic Patients' Questions in Chinese: Physicians vs. Large Language Model Chatbots (Preprint)

Author:

He WenjieORCID,Zhang Wenyan,Jin Ya,Zhou Qiang,Zhang Huadan,Xia QingORCID

Abstract

BACKGROUND

There is a dearth of Feasibility assessments regarding the utilization of large language models for responding to inquiries from autistic patients within a Chinese-language context. Despite Chinese being one of the most widely spoken languages globally, the predominant focus of research on the application of these models in the medical field has been on English-speaking populations.

OBJECTIVE

To assess the effectiveness of LLM chatbots, specifically ChatGPT and ERNIE Bot, in addressing inquiries from individuals with autism in a Chinese setting.

METHODS

A total of 100 patient consultation samples were randomly selected from publicly available autism-related records on DXY, spanning the period from January 2018 to August 2023 and including 239 questions. To maintain objectivity, both the original questions and responses were anonymized and randomized. An evaluation team, consisting of three chief physicians, assessed the responses across four dimensions: relevance, accuracy, usefulness, and empathy. In total, 717 evaluations were conducted. The team initially identified the best response and then employed a Likert scale with five response categories to gauge the responses, each representing a distinct level of quality. Finally, a comparative analysis was conducted to compare the responses obtained from various sources.

RESULTS

Among the 717 evaluations conducted, 46.86% (95% CI, 43.21%–50.51%) of assessors displayed varying preferences for responses from physicians, with 34.87% (95% CI, 31.38%–38.36%) favoring ChatGPT and 18.27% (95% CI, 15.44%–21.10%) favoring ERNIE Bot. The average relevance scores for physicians, ChatGPT, and ERNIE Bot were 3.75 (95% CI, 3.69–3.82), 3.69 (95% CI, 3.63–3.74), and 3.41 (95% CI, 3.35–3.46), respectively. Regarding accuracy ratings, physicians (3.66, 95% CI, 3.60–3.73) and ChatGPT (3.73, 95% CI, 3.69–3.77) outperformed ERNIE Bot (3.52, 95%CI, 3.47–3.57). In terms of usefulness scores, physicians (3.54, 95% CI, 3.47–3.62) received higher ratings than ChatGPT (3.40, 95% CI, 3.34–3.47) and ERNIE Bot (3.05, 95% CI, 2.99–3.12). Finally, concerning the empathy dimension, ChatGPT (3.64, 95% CI, 3.57–3.71) outperformed physicians (3.13, 95% CI, 3.04–3.21) and ERNIE Bot (3.11, 95% CI, 3.04–3.18).

CONCLUSIONS

In this cross-sectional study, physicians' responses exhibited overall superiority in the present Chinese language context. Nonetheless, LLMs can provide valuable medical guidance to patients with autism and may even surpass physicians in terms of demonstrating empathy. However, it is crucial to acknowledge that further optimization and research are imperative prerequisites before the effective integration of LLMs in clinical settings across diverse linguistic environments can be realized.

CLINICALTRIAL

The study was registered on chictr.org (ChiCTR2300074655)

Publisher

JMIR Publications Inc.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3