Evaluating the Accuracy of ChatGPT in Common Patient Questions Regarding HPV+ Oropharyngeal Carcinoma

Author:

Bellamkonda Nikhil1ORCID,Farlow Janice L.2ORCID,Haring Catherine T.3,Sim Michael W.2,Seim Nolan B.3,Cannon Richard B.1,Monroe Marcus M.1,Agrawal Amit3,Rocco James W.3,McCrary Hilary C.1ORCID

Affiliation:

1. Department of Otolaryngology—Head and Neck Surgery, University of Utah, Salt Lake City, UT, USA

2. Department of Otolaryngology—Head and Neck Surgery, Indiana University, Indianapolis, IN, USA

3. Department of Otolaryngology-Head and Neck Surgery, The Ohio State University Wexner Medical Center, Columbus, OH, USA

Abstract

Objectives: Large language model (LLM)-based chatbots such as ChatGPT have been publicly available and increasingly utilized by the general public since late 2022. This study sought to investigate ChatGPT responses to common patient questions regarding Human Papilloma Virus (HPV) positive oropharyngeal cancer (OPC). Methods: This was a prospective, multi-institutional study, with data collected from high volume institutions that perform >50 transoral robotic surgery cases per year. The 100 most recent discussion threads including the term “HPV” on the American Cancer Society’s Cancer Survivors Network’s Head and Neck Cancer public discussion board were reviewed. The 11 most common questions were serially queried to ChatGPT 3.5; answers were recorded. A survey was distributed to fellowship trained head and neck oncologic surgeons at 3 institutions to evaluate the responses. Results: A total of 8 surgeons participated in the study. For questions regarding HPV contraction and transmission, ChatGPT answers were scored as clinically accurate and aligned with consensus in the head and neck surgical oncology community 84.4% and 90.6% of the time, respectively. For questions involving treatment of HPV+ OPC, ChatGPT was clinically accurate and aligned with consensus 87.5% and 91.7% of the time, respectively. For questions regarding the HPV vaccine, ChatGPT was clinically accurate and aligned with consensus 62.5% and 75% of the time, respectively. When asked about circulating tumor DNA testing, only 12.5% of surgeons thought responses were accurate or consistent with consensus. Conclusion: ChatGPT 3.5 performed poorly with questions involving evolving therapies and diagnostics—thus, caution should be used when using a platform like ChatGPT 3.5 to assess use of advanced technology. Patients should be counseled on the importance of consulting their surgeons to receive accurate and up to date recommendations, and use LLM’s to augment their understanding of these important health-related topics.

Publisher

SAGE Publications

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3