Use of artificial intelligence chatbots in clinical management of immune-related adverse events-Reference-Cited by-同舟云学术

Use of artificial intelligence chatbots in clinical management of immune-related adverse events

Published:2024-05 Issue:5 Volume:12 Page:e008599
ISSN:2051-1426
Container-title:Journal for ImmunoTherapy of Cancer
language:en
Short-container-title:J Immunother Cancer

Author:

Burnette Hannah^ORCID,Pabani Aliyah^ORCID,von Itzstein Mitchell S,Switzer Benjamin^ORCID,Fan Run,Ye Fei,Puzanov Igor^ORCID,Naidoo Jarushka^ORCID,Ascierto Paolo A^ORCID,Gerber David E^ORCID,Ernstoff Marc S^ORCID,Johnson Douglas B

Abstract

BackgroundArtificial intelligence (AI) chatbots have become a major source of general and medical information, though their accuracy and completeness are still being assessed. Their utility to answer questions surrounding immune-related adverse events (irAEs), common and potentially dangerous toxicities from cancer immunotherapy, are not well defined.MethodsWe developed 50 distinct questions with answers in available guidelines surrounding 10 irAE categories and queried two AI chatbots (ChatGPT and Bard), along with an additional 20 patient-specific scenarios. Experts in irAE management scored answers for accuracy and completion using a Likert scale ranging from 1 (least accurate/complete) to 4 (most accurate/complete). Answers across categories and across engines were compared.ResultsOverall, both engines scored highly for accuracy (mean scores for ChatGPT and Bard were 3.87 vs 3.5, p<0.01) and completeness (3.83 vs 3.46, p<0.01). Scores of 1–2 (completely or mostly inaccurate or incomplete) were particularly rare for ChatGPT (6/800 answer-ratings, 0.75%). Of the 50 questions, all eight physician raters gave ChatGPT a rating of 4 (fully accurate or complete) for 22 questions (for accuracy) and 16 questions (for completeness). In the 20 patient scenarios, the average accuracy score was 3.725 (median 4) and the average completeness was 3.61 (median 4).ConclusionsAI chatbots provided largely accurate and complete information regarding irAEs, and wildly inaccurate information (“hallucinations”) was uncommon. However, until accuracy and completeness increases further, appropriate guidelines remain the gold standard to follow

Funder

Susan and Luke Simons Directorship for Melanoma

Division of Cancer Prevention, National Cancer Institute

Van Stephenson Melanoma Fund

James C. Bradford Melanoma Fund

Publisher

BMJ

Reference19 articles.

1. Chatgpt and other large language models are double-edged swords;Shen;Radiology,2023

2. Assessment of artificial intelligence Chatbot responses to top searched queries about cancer;Pan;JAMA Oncol,2023

3. Use of artificial intelligence Chatbots for cancer treatment information;Chen;JAMA Oncol,2023

4. Accuracy and reliability of Chatbot responses to physician questions;Goodman;JAMA Netw Open,2023

5. The epidemiology of migraine headache in Arab countries: A systematic review;El-Metwally;ScientificWorldJournal,2020