Assessment of ChatGPT-3.5's Knowledge in Oncology: Comparative Study with ASCO-SEP Benchmarks-Reference-Cited by-同舟云学术

Assessment of ChatGPT-3.5's Knowledge in Oncology: Comparative Study with ASCO-SEP Benchmarks

Published:2024-01-12 Issue: Volume:3 Page:e50442
ISSN:2817-1705
Container-title:JMIR AI
language:en
Short-container-title:JMIR AI

Author:

Odabashian Roupen^ORCID,Bastin Donald^ORCID,Jones Georden^ORCID,Manzoor Maria^ORCID,Tangestaniapour Sina^ORCID,Assad Malke^ORCID,Lakhani Sunita^ORCID,Odabashian Maritsa^ORCID,McGee Sharon^ORCID

Abstract

Background ChatGPT (Open AI) is a state-of-the-art large language model that uses artificial intelligence (AI) to address questions across diverse topics. The American Society of Clinical Oncology Self-Evaluation Program (ASCO-SEP) created a comprehensive educational program to help physicians keep up to date with the many rapid advances in the field. The question bank consists of multiple choice questions addressing the many facets of cancer care, including diagnosis, treatment, and supportive care. As ChatGPT applications rapidly expand, it becomes vital to ascertain if the knowledge of ChatGPT-3.5 matches the established standards that oncologists are recommended to follow. Objective This study aims to evaluate whether ChatGPT-3.5’s knowledge aligns with the established benchmarks that oncologists are expected to adhere to. This will furnish us with a deeper understanding of the potential applications of this tool as a support for clinical decision-making. Methods We conducted a systematic assessment of the performance of ChatGPT-3.5 on the ASCO-SEP, the leading educational and assessment tool for medical oncologists in training and practice. Over 1000 multiple choice questions covering the spectrum of cancer care were extracted. Questions were categorized by cancer type or discipline, with subcategorization as treatment, diagnosis, or other. Answers were scored as correct if ChatGPT-3.5 selected the answer as defined by ASCO-SEP. Results Overall, ChatGPT-3.5 achieved a score of 56.1% (583/1040) for the correct answers provided. The program demonstrated varying levels of accuracy across cancer types or disciplines. The highest accuracy was observed in questions related to developmental therapeutics (8/10; 80% correct), while the lowest accuracy was observed in questions related to gastrointestinal cancer (102/209; 48.8% correct). There was no significant difference in the program’s performance across the predefined subcategories of diagnosis, treatment, and other (P=.16, which is greater than .05). Conclusions This study evaluated ChatGPT-3.5’s oncology knowledge using the ASCO-SEP, aiming to address uncertainties regarding AI tools like ChatGPT in clinical decision-making. Our findings suggest that while ChatGPT-3.5 offers a hopeful outlook for AI in oncology, its present performance in ASCO-SEP tests necessitates further refinement to reach the requisite competency levels. Future assessments could explore ChatGPT’s clinical decision support capabilities with real-world clinical scenarios, its ease of integration into medical workflows, and its potential to foster interdisciplinary collaboration and patient engagement in health care settings.

Publisher

JMIR Publications Inc.

Reference39 articles.

1. The potential for artificial intelligence in healthcare

2. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models

3. CHatGPT goes law schoolUniversity of Minnesota Law School20232023-12-08https://twin-cities.umn.edu/news-events/chatgpt-goes-law-school#:~:text=MINNEAPOLIS%2FST.,achieved%20low%20but%20passing%20grades

4. ChatGPT - Reshaping medical education and clinical management