Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study-Reference-Cited by-同舟云学术

Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study

Published:2023-06-29 Issue: Volume:9 Page:e48002
ISSN:2369-3762
Container-title:JMIR Medical Education
language:en
Short-container-title:JMIR Med Educ

Author:

Takagi Soshi^ORCID,Watari Takashi^ORCID,Erabi Ayano^ORCID,Sakaguchi Kota^ORCID

Abstract

Background The competence of ChatGPT (Chat Generative Pre-Trained Transformer) in non-English languages is not well studied. Objective This study compared the performances of GPT-3.5 (Generative Pre-trained Transformer) and GPT-4 on the Japanese Medical Licensing Examination (JMLE) to evaluate the reliability of these models for clinical reasoning and medical knowledge in non-English languages. Methods This study used the default mode of ChatGPT, which is based on GPT-3.5; the GPT-4 model of ChatGPT Plus; and the 117th JMLE in 2023. A total of 254 questions were included in the final analysis, which were categorized into 3 types, namely general, clinical, and clinical sentence questions. Results The results indicated that GPT-4 outperformed GPT-3.5 in terms of accuracy, particularly for general, clinical, and clinical sentence questions. GPT-4 also performed better on difficult questions and specific disease questions. Furthermore, GPT-4 achieved the passing criteria for the JMLE, indicating its reliability for clinical reasoning and medical knowledge in non-English languages. Conclusions GPT-4 could become a valuable tool for medical education and clinical support in non–English-speaking regions, such as Japan.

Publisher

JMIR Publications Inc.

Subject

Education

Reference26 articles.

1. Introducing ChatGPTOpenAI2022-11-30https://openai.com/blog/chatgpt/

2. How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment

Cited by 119 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Evaluating the efficacy of leading large language models in the Japanese national dental hygienist examination: A comparative analysis of ChatGPT, Bard, and Bing Chat;Journal of Dental Sciences;2024-10

2. Analysis of Responses of GPT-4 V to the Japanese National Clinical Engineer Licensing Examination;Journal of Medical Systems;2024-09-11

3. ChatGPT as a global doctor: a rapid review of its performance on national licensing medical examination (Preprint);2024-08-29

4. Influence of Model Evolution and System Roles on ChatGPT’s Performance in Chinese Medical Licensing Exams: Comparative Study;JMIR Medical Education;2024-08-13

5. Potential of ChatGPT to Pass the Japanese Medical and Healthcare Professional National Licenses: A Literature Review;Cureus;2024-08-06