ChatGPT Yields a Passing Score on a Pediatric Board Preparatory Exam but Raises Red Flags-Reference-Cited by-同舟云学术

ChatGPT Yields a Passing Score on a Pediatric Board Preparatory Exam but Raises Red Flags

Published:2024-01 Issue: Volume:11 Page:
ISSN:2333-794X
Container-title:Global Pediatric Health
language:en
Short-container-title:Global Pediatric Health

Author:

Le Mindy¹,Davis Michael¹^ORCID

Affiliation:

1. University of Florida College of Medicine, Gainesville, FL, USA

Abstract

Objectives We aimed to evaluate the performance of a publicly-available online artificial intelligence program (OpenAI’s ChatGPT-3.5 and -4.0, August 3 versions) on a pediatric board preparatory examination, 2021 and 2022 PREP® Self-Assessment, American Academy of Pediatrics (AAP). Methods We entered 245 questions and answer choices from the Pediatrics 2021 PREP® Self-Assessment and 247 questions and answer choices from the Pediatrics 2022 PREP® Self-Assessment into OpenAI’s ChatGPT-3.5 and ChatGPT-4.0, August 3 versions, in September 2023. The ChatGPT-3.5 and 4.0 scores were compared with the advertised passing scores (70%+) for the PREP® exams and the average scores (74.09%) and (75.71%) for all 10 715 and 6825 first-time human test takers. Results For the AAP 2021 and 2022 PREP® Self-Assessments, ChatGPT-3.5 answered 143 of 243 (58.85%) and 137 of 247 (55.46%) questions correctly on a single attempt. ChatGPT-4.0 answered 193 of 243 (79.84%) and 208 of 247 (84.21%) questions correctly. Conclusion Using a publicly-available online chatbot to answer pediatric board preparatory examination questions yielded a passing score but demonstrated significant limitations in the chatbot’s ability to assess some complex medical situations in children, posing a potential risk to this vulnerable population.

Publisher

SAGE Publications

Link

https://journals.sagepub.com/doi/pdf/10.1177/2333794X241240327

Reference41 articles.

1. The promise of artificial intelligence: a review of the opportunities and challenges of artificial intelligence in healthcare

2. Natural Language Processing for Smart Healthcare

3. Large language models in medicine

4. Creation and Adoption of Large Language Models in Medicine

5. AI in health and medicine

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. From GPT-3.5 to GPT-4.o: A Leap in AI’s Medical Exam Performance;Information;2024-09-05

2. A Systematic Review and Comprehensive Analysis of Pioneering AI Chatbot Models from Education to Healthcare: ChatGPT, Bard, Llama, Ernie and Grok;Future Internet;2024-06-22