Below average ChatGPT performance in medical microbiology exam compared to university students-Reference-Cited by-同舟云学术

Below average ChatGPT performance in medical microbiology exam compared to university students

Published:2023-12-21 Issue: Volume:8 Page:
ISSN:2504-284X
Container-title:Frontiers in Education
language:
Short-container-title:Front. Educ.

Author:

Sallam Malik,Al-Salahat Khaled

Abstract

BackgroundThe transformative potential of artificial intelligence (AI) in higher education is evident, with conversational models like ChatGPT poised to reshape teaching and assessment methods. The rapid evolution of AI models requires a continuous evaluation. AI-based models can offer personalized learning experiences but raises accuracy concerns. MCQs are widely used for competency assessment. The aim of this study was to evaluate ChatGPT performance in medical microbiology MCQs compared to the students’ performance.MethodsThe study employed an 80-MCQ dataset from a 2021 medical microbiology exam at the University of Jordan Doctor of Dental Surgery (DDS) Medical Microbiology 2 course. The exam contained 40 midterm and 40 final MCQs, authored by a single instructor without copyright issues. The MCQs were categorized based on the revised Bloom’s Taxonomy into four categories: Remember, Understand, Analyze, or Evaluate. Metrics, including facility index and discriminative efficiency, were derived from 153 midterm and 154 final exam DDS student performances. ChatGPT 3.5 was used to answer questions, and responses were assessed for correctness and clarity by two independent raters.ResultsChatGPT 3.5 correctly answered 64 out of 80 medical microbiology MCQs (80%) but scored below the student average (80.5/100 vs. 86.21/100). Incorrect ChatGPT responses were more common in MCQs with longer choices (p = 0.025). ChatGPT 3.5 performance varied across cognitive domains: Remember (88.5% correct), Understand (82.4% correct), Analyze (75% correct), Evaluate (72% correct), with no statistically significant differences (p = 0.492). Correct ChatGPT responses received statistically significant higher average clarity and correctness scores compared to incorrect responses.ConclusionThe study findings emphasized the need for ongoing refinement and evaluation of ChatGPT performance. ChatGPT 3.5 showed the potential to correctly and clearly answer medical microbiology MCQs; nevertheless, its performance was below-bar compared to the students. Variability in ChatGPT performance in different cognitive domains should be considered in future studies. The study insights could contribute to the ongoing evaluation of the AI-based models’ role in educational assessment and to augment the traditional methods in higher education.

Publisher

Frontiers Media SA

Subject

Education

Reference72 articles.

1. Sailing the seven seas: a multinational comparison of ChatGPT's performance on medical licensing examinations;Alfertshofer;Ann. Biomed. Eng.,2023

2. Evaluating the performance of ChatGPT in ophthalmology: an analysis of its successes and shortcomings;Antaki;Ophthalmol. Sci.,2023

3. Exploring the possible use of AI Chatbots in public health education: feasibility study;Baglivo;JMIR Med. Educ.,2023

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses;BMC Research Notes;2024-09-03

2. Assessment Study of ChatGPT-3.5’s Performance on the Final Polish Medical Examination: Accuracy in Answering 980 Questions;Healthcare;2024-08-16

3. Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom’s Taxonomy;Advances in Medical Education and Practice;2024-05

4. A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence–Based Models in Health Care Education and Practice: Development Study Involving a Literature Review;Interactive Journal of Medical Research;2024-02-15

5. A multinational study on the factors influencing university students’ attitudes and usage of ChatGPT;Scientific Reports;2024-01-23