Large language models and bariatric surgery patient education: a comparative readability analysis of GPT-3.5, GPT-4, Bard, and online institutional resources

Author:

Srinivasan Nitin,Samaan Jamil S.,Rajeev Nithya D.,Kanu Mmerobasi U.,Yeo Yee Hui,Samakar KamranORCID

Abstract

Abstract Background The readability of online bariatric surgery patient education materials (PEMs) often surpasses the recommended 6th grade level. Large language models (LLMs), like ChatGPT and Bard, have the potential to revolutionize PEM delivery. We aimed to evaluate the readability of PEMs produced by U.S. medical institutions compared to LLMs, as well as the ability of LLMs to simplify their responses. Methods Responses to frequently asked questions (FAQs) related to bariatric surgery were gathered from top-ranked health institutions. FAQ responses were also generated from GPT-3.5, GPT-4, and Bard. LLMs were then prompted to improve the readability of their initial responses. The readability of institutional responses, initial LLM responses, and simplified LLM responses were graded using validated readability formulas. Accuracy and comprehensiveness of initial and simplified LLM responses were also compared. Results Responses to 66 FAQs were included. All institutional and initial LLM responses had poor readability, with average reading levels ranging from 9th grade to college graduate. Simplified responses from LLMs had significantly improved readability, with reading levels ranging from 6th grade to college freshman. When comparing simplified LLM responses, GPT-4 responses demonstrated the highest readability, with reading levels ranging from 6th to 9th grade. Accuracy was similar between initial and simplified responses from all LLMs. Comprehensiveness was similar between initial and simplified responses from GPT-3.5 and GPT-4. However, 34.8% of Bard's simplified responses were graded as less comprehensive compared to initial. Conclusion Our study highlights the efficacy of LLMs in enhancing the readability of bariatric surgery PEMs. GPT-4 outperformed other models, generating simplified PEMs from 6th to 9th grade reading levels. Unlike GPT-3.5 and GPT-4, Bard’s simplified responses were graded as less comprehensive. We advocate for future studies examining the potential role of LLMs as dynamic and personalized sources of PEMs for diverse patient populations of all literacy levels.

Funder

University of Southern California

Publisher

Springer Science and Business Media LLC

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3