Exploring the proficiency of ChatGPT-4: An evaluation of its performance in the Taiwan advanced medical licensing examination-Reference-Cited by-同舟云学术

Exploring the proficiency of ChatGPT-4: An evaluation of its performance in the Taiwan advanced medical licensing examination

Published:2024-01 Issue: Volume:10 Page:
ISSN:2055-2076
Container-title:DIGITAL HEALTH
language:en
Short-container-title:DIGITAL HEALTH

Author:

Lin Shih-Yi¹²,Chan Pak Ki³,Hsu Wu-Huei¹⁴,Kao Chia-Hung¹³⁵⁶^ORCID

Affiliation:

1. Graduate Institute of Clinical Medical Science, College of Medicine, China Medical University, Taichung, Taiwan

2. Division of Nephrology and Kidney Institute, China Medical University Hospital, Taichung, Taiwan

3. Artificial Intelligence Center, China Medical University Hospital, Taichung, Taiwan

4. Department of Chest Medicine, China Medical University Hospital, Taichung, Taiwan

5. Department of Nuclear Medicine and PET Center, China Medical University Hospital, Taichung, Taiwan

6. Department of Bioinformatics and Medical Engineering, Asia University, Taichung, Taiwan

Abstract

Background Taiwan is well-known for its quality healthcare system. The country's medical licensing exams offer a way to evaluate ChatGPT's medical proficiency. Methods We analyzed exam data from February 2022, July 2022, February 2023, and July 2033. Each exam included four papers with 80 single-choice questions, grouped as descriptive or picture-based. We used ChatGPT-4 for evaluation. Incorrect answers prompted a “chain of thought” approach. Accuracy rates were calculated as percentages. Results ChatGPT-4's accuracy in medical exams ranged from 63.75% to 93.75% (February 2022–July 2023). The highest accuracy (93.75%) was in February 2022's Medicine Exam (3). Subjects with the highest misanswered rates were ophthalmology (28.95%), breast surgery (27.27%), plastic surgery (26.67%), orthopedics (25.00%), and general surgery (24.59%). While using “chain of thought,” the “Accuracy of (CoT) prompting” ranged from 0.00% to 88.89%, and the final overall accuracy rate ranged from 90% to 98%. Conclusion ChatGPT-4 succeeded in Taiwan's medical licensing exams. With the “chain of thought” prompt, it improved accuracy to over 90%.

Publisher

SAGE Publications

Link

http://journals.sagepub.com/doi/pdf/10.1177/20552076241237678

Reference41 articles.

1. https://www.taiwannews.com.tw/en/news/4941474.

2. https://focustaiwan.tw/society/202202050011.

3. Analysis of COVID-19 prevention and treatment in Taiwan (Review)

4. Reflections On The 20th Anniversary Of Taiwan’s Single-Payer National Health Insurance System

5. Medical education in Taiwan

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. From GPT-3.5 to GPT-4.o: A Leap in AI’s Medical Exam Performance;Information;2024-09-05

2. ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination (Preprint);2024-08-29

3. Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis;Journal of Medical Internet Research;2024-07-25

4. Assessing ChatGPT-4's Proficiency in English College Entrance Examinations Using Web Raschonline: A Comparative Study (Preprint);2024-07-19

5. How well do large language model-based chatbots perform in oral and maxillofacial radiology?;Dentomaxillofacial Radiology;2024-06-07