Performance of ChatGPT on Clinical Medicine Entrance Examination for Chinese Postgraduate in Chinese (Preprint)

Author:

Liu Xiao,Fang Changchang,Yan ZiWei,Liu Xiaoling,Jiang Yuan,Jiang Yuan,Cao Zhengyu,Wu Maoxiong,Wu Maoxiong,Wu Maoxiong,Wu Maoxiong,Chen Zhiteng,Ma Jianyong,Yu Peng,Zhu Wengen,Chen Yangxin,Zhang Yuling,Ayiguli Abudukeremu,Wang Yue,Wang Jingfeng

Abstract

BACKGROUND

The ChatGPT, a Large-scale language models-based Artificial intelligence (AI), has fueled interest in medical care. However, the ability of AI to understand and generate text is constrained by the quality and quantity of training data available for that language. This study aims to provide qualitative feedback on ChatGPT's problem-solving capabilities in medical education and clinical decision-making in Chinese.

OBJECTIVE

This study aims to provide qualitative feedback on ChatGPT's problem-solving capabilities in medical education and clinical decision-making in Chinese.

METHODS

A dataset of Clinical Medicine Entrance Examination for Chinese Postgraduate was used to assess the effectiveness of ChatGPT3.5 in medical knowledge in Chinese language. The indictor of accuracy, concordance (explaining affirms the answer) and frequency of insights was used to assess performance of ChatGPT in original and encoding medical questions.

RESULTS

According to our evaluation, ChatGPT received a score of 153.5/300 for original questions in Chinese, which is slightly above the passing threshold of 129/300. Additionally, ChatGPT showed low accuracy in answering open-ended medical questions, with total accuracy of 31.5%. While ChatGPT demonstrated a commendable level of concordance (achieving 90% concordance across all questions) and generated innovative insights for most problems (at least one significant insight for 80% of all questions).

CONCLUSIONS

ChatGPT's performance was suboptimal for medical education and clinical decision-making in Chinese compared with in English. However, ChatGPT demonstrated high internal concordance and generated multiple insights in Chinese language. Further research should investigate language-based differences in ChatGPT's healthcare performance.

INTERNATIONAL REGISTERED REPORT

RR2-https://doi.org/10.1101/2023.04.12.23288452

Publisher

JMIR Publications Inc.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3