ChatGPT failed Taiwan’s Family Medicine Board Exam

Author:

Weng Tzu-Ling12,Wang Ying-Mei345,Chang Samuel6,Chen Tzeng-Ji789,Hwang Shinn-Jang1011

Affiliation:

1. Center for Geriatrics and Gerontology, Taipei Veterans General Hospital, Taipei, Taiwan, ROC

2. Institute of Public Health, School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC

3. Department of Medical Education and Research, Taipei Veterans General Hospital Hsinchu Branch, Hsinchu, Taiwan, ROC

4. Department of Pharmacy, Taipei Veterans General Hospital Hsinchu Branch, Hsinchu, Taiwan, ROC

5. School of Medicine, National Tsing Hua University, Hsinchu, Taiwan, ROC

6. School of Medicine, Taipei Medical University, Taipei, Taiwan, ROC

7. Department of Family Medicine, Taipei Veterans General Hospital Hsinchu Branch, Hsinchu, Taiwan, ROC

8. Department of Family Medicine, Taipei Veterans General Hospital, Taipei, Taiwan, ROC

9. Department of Post-Baccalaureate Medicine, National Chung Hsing University, Taichung, Taiwan, ROC

10. Department of Family Medicine, En Chu Kong Hospital, New Taipei City, Taiwan, ROC

11. Department of Family Medicine, National Yang Ming Chiao Tung University, School of Medicine, Taipei, Taiwan, ROC

Abstract

Background: Chat Generative Pre-trained Transformer (ChatGPT), OpenAI Limited Partnership, San Francisco, CA, USA is an artificial intelligence language model gaining popularity because of its large database and ability to interpret and respond to various queries. Although it has been tested by researchers in different fields, its performance varies depending on the domain. We aimed to further test its ability in the medical field. Methods: We used questions from Taiwan’s 2022 Family Medicine Board Exam, which combined both Chinese and English and covered various question types, including reverse questions and multiple-choice questions, and mainly focused on general medical knowledge. We pasted each question into ChatGPT and recorded its response, comparing it to the correct answer provided by the exam board. We used SAS 9.4 (Cary, North Carolina, USA) and Excel to calculate the accuracy rates for each question type. Results: ChatGPT answered 52 questions out of 125 correctly, with an accuracy rate of 41.6%. The questions’ length did not affect the accuracy rates. These were 45.5%, 33.3%, 58.3%, 50.0%, and 43.5% for negative-phrase questions, multiple-choice questions, mutually exclusive options, case scenario questions, and Taiwan’s local policy-related questions, with no statistical difference observed. Conclusion: ChatGPT’s accuracy rate was not good enough for Taiwan’s Family Medicine Board Exam. Possible reasons include the difficulty level of the specialist exam and the relatively weak database of traditional Chinese language resources. However, ChatGPT performed acceptably in negative-phrase questions, mutually exclusive questions, and case scenario questions, and it can be a helpful tool for learning and exam preparation. Future research can explore ways to improve ChatGPT’s accuracy rate for specialized exams and other domains.

Publisher

Ovid Technologies (Wolters Kluwer Health)

Subject

General Medicine

Reference19 articles.

1. Holy or unholy? Interview with Open AI’s ChatGPT.;Iskender;European J Tourism Research,2023

2. An era of ChatGPT as a significant futuristic support tool: a study on features, abilities, and challenges.;Haleem;BenchCouncil Transact Benchmarks, Standards Evaluations,2022

3. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models.;Kung;PLOS Digit Health,2023

4. How does ChatGPT perform on the United States Medical Licensing Examination? The implications of large language models for medical education and knowledge assessment.;Gilson;JMIR Med Educ,2023

5. The role of artificial intelligence in higher education: ChatGPT assessment for anatomy course.;Talan;Int J Management Information Syst Computer Science,2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3