The scientific knowledge of three large language models in cardiology: multiple-choice questions examination-based performance-Reference-Cited by-同舟云学术

The scientific knowledge of three large language models in cardiology: multiple-choice questions examination-based performance

Published:2024-05-06 Issue:6 Volume:86 Page:3261-3266
ISSN:2049-0801
Container-title:Annals of Medicine & Surgery
language:en
Short-container-title:

Author:

Altamimi Ibraheem¹²,Alhumimidi Abdullah¹,Alshehri Salem¹,Alrumayan Abdullah³,Al-khlaiwi Thamir⁴,Meo Sultan A.⁴,Temsah Mohamad-Hani¹²⁵

Affiliation:

1. College of Medicine

2. Evidence-Based Health Care and Knowledge Translation Research Chair, Family and Community Medicine Department, College of Medicine, King Saud University

3. College of Medicine, King Saud Bin Abdulaziz University for Health and Sciences, Riyadh, Saudi Arabia

4. Department of Physiology

5. Pediatric Intensive Care Unit, Pediatric Department, College of Medicine, King Saud University Medical City

Abstract

Background: The integration of artificial intelligence (AI) chatbots like Google’s Bard, OpenAI’s ChatGPT, and Microsoft’s Bing Chatbot into academic and professional domains, including cardiology, has been rapidly evolving. Their application in educational and research frameworks, however, raises questions about their efficacy, particularly in specialized fields like cardiology. This study aims to evaluate the knowledge depth and accuracy of these AI chatbots in cardiology using a multiple-choice question (MCQ) format. Methods: The study was conducted as an exploratory, cross-sectional study in November 2023 on a bank of 100 MCQs covering various cardiology topics that was created from authoritative textbooks and question banks. These MCQs were then used to assess the knowledge level of Google’s Bard, Microsoft Bing, and ChatGPT 4.0. Each question was entered manually into the chatbots, ensuring no memory retention bias. Results: The study found that ChatGPT 4.0 demonstrated the highest knowledge score in cardiology, with 87% accuracy, followed by Bing at 60% and Bard at 46%. The performance varied across different cardiology subtopics, with ChatGPT consistently outperforming the others. Notably, the study revealed significant differences in the proficiency of these chatbots in specific cardiology domains. Conclusion: This study highlights a spectrum of efficacy among AI chatbots in disseminating cardiology knowledge. ChatGPT 4.0 emerged as a potential auxiliary educational resource in cardiology, surpassing traditional learning methods in some aspects. However, the variability in performance among these AI systems underscores the need for cautious evaluation and continuous improvement, especially for chatbots like Bard, to ensure reliability and accuracy in medical knowledge dissemination.

Publisher

Ovid Technologies (Wolters Kluwer Health)

Reference22 articles.

1. The AI race is on! Google’s Bard and OpenAI’s ChatGPT head to head: an opinion article;Rahaman;Mizanur and Rahman, Md Nafizur, The AI Race is on,2023

2. Can artificial intelligence help for scientific writing?;Salvagno;Crit Care,2023

3. Could AI help you to write your next paper?;Hutson;Nature,2022

4. Artificial intelligence AI-based Chatbot study of ChatGPT, Google AI Bard and Baidu AI;Ram;World J Adv Engineer Technol Sci,2023

5. Snakebite advice and counseling from artificial intelligence: an acute venomous snakebite consultation with ChatGPT;Altamimi;Cureus,2023