Evaluating AI Models for the National Pre-Medical Exam in India: A Head-to-Head Analysis of ChatGPT-3.5, GPT-4, and Bard (Preprint)-Reference-Cited by-同舟云学术

Evaluating AI Models for the National Pre-Medical Exam in India: A Head-to-Head Analysis of ChatGPT-3.5, GPT-4, and Bard (Preprint)

Published:2023-08-02 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Farhat Faiza,Chaudry Beenish M.^ORCID,Nadeem Mohammad,Sohail Shahab Saquib,Madsen Dag Øivind^ORCID

Abstract

BACKGROUND

Large language models (LLMs) have revolutionized Natural Language Processing (NLP) with their ability to generate human-like text through extensive training on large datasets. These models, including ChatGPT-3.5, GPT-4, and Bard, find applications beyond NLP, attracting interest from academia and industry. Students are actively leveraging LLMs to enhance learning experiences and prepare for high-stakes exams, such as the National Eligibility cum Entrance Test (NEET) in India.

OBJECTIVE

This comparative analysis aims to evaluate the performance of ChatGPT-3.5, GPT-4, and Bard in answering NEET-2023 questions.

METHODS

In this paper, we test the performance of ChatGPT 3.5, GPT-4, and Bard on pre-medical exam in India, NEET-2023. The questions of NEET were provided to these AI models, and the responses were recorded. Precision, recall, accuracy and F1 score were used to evaluate the performance of all three models.

RESULTS

GPT-4 demonstrated consistent superiority over Bard and ChatGPT-3.5 in all three subjects. Specifically, GPT-4 achieved accuracy rates of 72.5% in Physics, 44.44% in Chemistry, and 50.5% in Biology.

CONCLUSIONS

The study's findings provide valuable insights into the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions. GPT-4 emerged as the most accurate model, highlighting its potential for educational applications. The results underscore the suitability of LLMs for high-stakes exams and their positive impact on education. Additionally, the study establishes a benchmark for evaluating and enhancing LLMs' performance in educational tasks, promoting responsible and informed use of these models in diverse learning environments.

Publisher

JMIR Publications Inc.

Reference34 articles.

1. How trustworthy is ChatGPT? The case of bibliometric analyses

2. Leveraging ChatGPT and other generative artificial intelligence (AI)-based applications in the hospitality and tourism industry: practices, challenges and research agenda

3. ChatGPT and Vaccines: Can AI Chatbots Boost Awareness and Uptake?

4. Use of ChatGPT in ESP Teaching Process

5. AI as Agency Without Intelligence: on ChatGPT, Large Language Models, and Other Generative Models

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Evaluating the performance of ChatGPT-4 on the United Kingdom Medical Licensing Assessment;Frontiers in Medicine;2023-09-19