Generative artificial intelligence as a source of breast cancer information for patients: Proceed with caution-Reference-Cited by-同舟云学术

Generative artificial intelligence as a source of breast cancer information for patients: Proceed with caution

Published:2024-08-30 Issue: Volume: Page:
ISSN:0008-543X
Container-title:Cancer
language:en
Short-container-title:Cancer

Author:

Park Ko Un¹²³^ORCID,Lipsitz Stuart³⁴,Dominici Laura S.¹²³,Lynce Filipa²³⁵^ORCID,Minami Christina A.¹²³,Nakhlis Faina¹²³,Waks Adrienne G.²³⁵,Warren Laura E.³⁶,Eidman Nadine⁷,Frazier Jeannie⁷,Hernandez Lourdes⁷,Leslie Carla⁷,Rafte Susan⁷,Stroud Delia⁷,Weissman Joel S.³⁴,King Tari A.¹²³^ORCID,Mittendorf Elizabeth A.¹²³

Affiliation:

1. Division of Breast Surgery Department of Surgery Brigham and Women's Hospital Boston Massachusetts USA

2. Breast Oncology Program Dana‐Farber Brigham Cancer Center Boston Massachusetts USA

3. Harvard Medical School Boston Massachusetts USA

4. Center for Surgery and Public Health Brigham and Women's Hospital Boston Massachusetts USA

5. Medical Oncology Dana‐Farber Cancer Institute Boston Massachusetts USA

6. Radiation Oncology Dana‐Farber Brigham Cancer Center Boston Massachusetts USA

7. The University of Texas MD Anderson Cancer Center Houston Texas USA

Abstract

AbstractBackgroundThis study evaluated the accuracy, clinical concordance, and readability of the chatbot interface generative pretrained transformer (ChatGPT) 3.5 as a source of breast cancer information for patients.MethodsTwenty questions that patients are likely to ask ChatGPT were identified by breast cancer advocates. These were posed to ChatGPT 3.5 in July 2023 and were repeated three times. Responses were graded in two domains: accuracy (4‐point Likert scale, 4 = worst) and clinical concordance (information is clinically similar to physician response; 5‐point Likert scale, 5 = not similar at all). The concordance of responses with repetition was estimated using intraclass correlation coefficient (ICC) of word counts. Response readability was calculated using the Flesch Kincaid readability scale. References were requested and verified.ResultsThe overall average accuracy was 1.88 (range 1.0–3.0; 95% confidence interval [CI], 1.42–1.94), and clinical concordance was 2.79 (range 1.0–5.0; 95% CI, 1.94–3.64). The average word count was 310 words per response (range, 146–441 words per response) with high concordance (ICC, 0.75; 95% CI, 0.59–0.91; p < .001). The average readability was poor at 37.9 (range, 18.0–60.5) with high concordance (ICC, 0.73; 95% CI, 0.57–0.90; p < .001). There was a weak correlation between ease of readability and better clinical concordance (−0.15; p = .025). Accuracy did not correlate with readability (0.05; p = .079). The average number of references was 1.97 (range, 1–4; total, 119). ChatGPT cited peer‐reviewed articles only once and often referenced nonexistent websites (41%).ConclusionsBecause ChatGPT 3.5 responses were incorrect 24% of the time and did not provide real references 41% of the time, patients should be cautioned about using ChatGPT for medical information.

Publisher

Wiley

Reference27 articles.

1. The extent to which cancer patients trust in cancer-related online information: a systematic review

2. Qualitative methods in implementation research: An introduction

3. Applying Rapid Qualitative Analysis for Health Equity: Lessons Learned Using “EARS” With Latino Communities

4. Expediting the Analysis of Qualitative Data in Evaluation