BACKGROUND
There is a dearth of Feasibility assessments regarding the utilization of large language models for responding to inquiries from autistic patients within a Chinese-language context. Despite Chinese being one of the most widely spoken languages globally, the predominant focus of research on the application of these models in the medical field has been on English-speaking populations.
OBJECTIVE
To assess the effectiveness of LLM chatbots, specifically ChatGPT and ERNIE Bot, in addressing inquiries from individuals with autism in a Chinese setting.
METHODS
A total of 100 patient consultation samples were randomly selected from publicly available autism-related records on DXY, spanning the period from January 2018 to August 2023 and including 239 questions. To maintain objectivity, both the original questions and responses were anonymized and randomized. An evaluation team, consisting of three chief physicians, assessed the responses across four dimensions: relevance, accuracy, usefulness, and empathy. In total, 717 evaluations were conducted. The team initially identified the best response and then employed a Likert scale with five response categories to gauge the responses, each representing a distinct level of quality. Finally, a comparative analysis was conducted to compare the responses obtained from various sources.
RESULTS
Among the 717 evaluations conducted, 46.86% (95% CI, 43.21%–50.51%) of assessors displayed varying preferences for responses from physicians, with 34.87% (95% CI, 31.38%–38.36%) favoring ChatGPT and 18.27% (95% CI, 15.44%–21.10%) favoring ERNIE Bot. The average relevance scores for physicians, ChatGPT, and ERNIE Bot were 3.75 (95% CI, 3.69–3.82), 3.69 (95% CI, 3.63–3.74), and 3.41 (95% CI, 3.35–3.46), respectively. Regarding accuracy ratings, physicians (3.66, 95% CI, 3.60–3.73) and ChatGPT (3.73, 95% CI, 3.69–3.77) outperformed ERNIE Bot (3.52, 95%CI, 3.47–3.57). In terms of usefulness scores, physicians (3.54, 95% CI, 3.47–3.62) received higher ratings than ChatGPT (3.40, 95% CI, 3.34–3.47) and ERNIE Bot (3.05, 95% CI, 2.99–3.12). Finally, concerning the empathy dimension, ChatGPT (3.64, 95% CI, 3.57–3.71) outperformed physicians (3.13, 95% CI, 3.04–3.21) and ERNIE Bot (3.11, 95% CI, 3.04–3.18).
CONCLUSIONS
In this cross-sectional study, physicians' responses exhibited overall superiority in the present Chinese language context. Nonetheless, LLMs can provide valuable medical guidance to patients with autism and may even surpass physicians in terms of demonstrating empathy. However, it is crucial to acknowledge that further optimization and research are imperative prerequisites before the effective integration of LLMs in clinical settings across diverse linguistic environments can be realized.
CLINICALTRIAL
The study was registered on chictr.org (ChiCTR2300074655)