Generative artificial intelligence as a source of breast cancer information for patients: Proceed with caution

Author:

Park Ko Un123ORCID,Lipsitz Stuart34,Dominici Laura S.123,Lynce Filipa235ORCID,Minami Christina A.123,Nakhlis Faina123,Waks Adrienne G.235,Warren Laura E.36,Eidman Nadine7,Frazier Jeannie7,Hernandez Lourdes7,Leslie Carla7,Rafte Susan7,Stroud Delia7,Weissman Joel S.34,King Tari A.123ORCID,Mittendorf Elizabeth A.123

Affiliation:

1. Division of Breast Surgery Department of Surgery Brigham and Women's Hospital Boston Massachusetts USA

2. Breast Oncology Program Dana‐Farber Brigham Cancer Center Boston Massachusetts USA

3. Harvard Medical School Boston Massachusetts USA

4. Center for Surgery and Public Health Brigham and Women's Hospital Boston Massachusetts USA

5. Medical Oncology Dana‐Farber Cancer Institute Boston Massachusetts USA

6. Radiation Oncology Dana‐Farber Brigham Cancer Center Boston Massachusetts USA

7. The University of Texas MD Anderson Cancer Center Houston Texas USA

Abstract

AbstractBackgroundThis study evaluated the accuracy, clinical concordance, and readability of the chatbot interface generative pretrained transformer (ChatGPT) 3.5 as a source of breast cancer information for patients.MethodsTwenty questions that patients are likely to ask ChatGPT were identified by breast cancer advocates. These were posed to ChatGPT 3.5 in July 2023 and were repeated three times. Responses were graded in two domains: accuracy (4‐point Likert scale, 4 = worst) and clinical concordance (information is clinically similar to physician response; 5‐point Likert scale, 5 = not similar at all). The concordance of responses with repetition was estimated using intraclass correlation coefficient (ICC) of word counts. Response readability was calculated using the Flesch Kincaid readability scale. References were requested and verified.ResultsThe overall average accuracy was 1.88 (range 1.0–3.0; 95% confidence interval [CI], 1.42–1.94), and clinical concordance was 2.79 (range 1.0–5.0; 95% CI, 1.94–3.64). The average word count was 310 words per response (range, 146–441 words per response) with high concordance (ICC, 0.75; 95% CI, 0.59–0.91; p < .001). The average readability was poor at 37.9 (range, 18.0–60.5) with high concordance (ICC, 0.73; 95% CI, 0.57–0.90; p < .001). There was a weak correlation between ease of readability and better clinical concordance (−0.15; p = .025). Accuracy did not correlate with readability (0.05; p = .079). The average number of references was 1.97 (range, 1–4; total, 119). ChatGPT cited peer‐reviewed articles only once and often referenced nonexistent websites (41%).ConclusionsBecause ChatGPT 3.5 responses were incorrect 24% of the time and did not provide real references 41% of the time, patients should be cautioned about using ChatGPT for medical information.

Publisher

Wiley

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3