Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study-Reference-Cited by-同舟云学术

Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study

Published:2023-10-16 Issue: Volume:20 Page:28
ISSN:1975-5937
Container-title:Journal of Educational Evaluation for Health Professions
language:en
Short-container-title:J Educ Eval Health Prof

Author:

Ignjatović Aleksandra^ORCID,Stevanović Lazar

Abstract

Purpose: This study aimed to assess the performance of ChatGPT (GPT-3.5 and GPT-4) as a study tool in solving biostatistical problems and to identify any potential drawbacks that might arise from using ChatGPT in medical education, particularly in solving practical biostatistical problems.Methods: ChatGPT was tested to evaluate its ability to solve biostatistical problems from the Handbook of Medical Statistics by Peacock and Peacock in this descriptive study. Tables from the problems were transformed into textual questions. Ten biostatistical problems were randomly chosen and used as text-based input for conversation with ChatGPT (versions 3.5 and 4).Results: GPT-3.5 solved 5 practical problems in the first attempt, related to categorical data, cross-sectional study, measuring reliability, probability properties, and the t-test. GPT-3.5 failed to provide correct answers regarding analysis of variance, the chi-square test, and sample size within 3 attempts. GPT-4 also solved a task related to the confidence interval in the first attempt and solved all questions within 3 attempts, with precise guidance and monitoring.Conclusion: The assessment of both versions of ChatGPT performance in 10 biostatistical problems revealed that GPT-3.5 and 4’s performance was below average, with correct response rates of 5 and 6 out of 10 on the first attempt. GPT-4 succeeded in providing all correct answers within 3 attempts. These findings indicate that students must be aware that this tool, even when providing and calculating different statistical analyses, can be wrong, and they should be aware of ChatGPT’s limitations and be careful when incorporating this model into medical education.

Funder

Ministry of Education, Science and Technological Development

Publisher

Korea Health Personnel Licensing Examination Institute

Subject

Education,General Health Professions

Link

http://jeehp.org/upload/pdf/jeehp-20-28.pdf

Reference15 articles.

1. Perception, performance, and detectability of conversational artificial intelligence across 32 university courses

2. ChatGPT versus human in generating medical graduate exam multiple choice questions—A multinational prospective study (Hong Kong S.A.R., Singapore, Ireland, and the United Kingdom)

3. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models

4. How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. In-depth analysis of ChatGPT’s performance based on specific signaling words and phrases in the question stem of 2377 USMLE step 1 style questions;Scientific Reports;2024-06-12

2. Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom’s Taxonomy;Advances in Medical Education and Practice;2024-05

3. Revolutionizing Cardiology With Words: Unveiling the Impact of Large Language Models in Medical Science Writing;Canadian Journal of Cardiology;2024-05

4. Large Language Models in Education: A Systematic Review;2024 6th International Conference on Computer Science and Technologies in Education (CSTE);2024-04-19

5. ChatGPT in medicine: prospects and challenges: a review article;International Journal of Surgery;2024-03-19