Readability and Presentation Suitability of ChatGPT's Medical Responses to Patient Questions: Cross-Sectional Study (Preprint)

Author:

Jang Chan WoongORCID,Yoo MyungeunORCID,Park Yoon GhilORCID

Abstract

BACKGROUND

Online medical information, like ChatGPT, is crucial for patients making health decisions. However, many struggle with low literacy skills when using such content. To help, we need to ensure that the information is easily readable for the average adult. Surprisingly, there's been no research on how well ChatGPT delivers medical information in text form.

OBJECTIVE

To assess the readability and presentation suitability of ChatGPT responses to the most commonly asked patient questions, as well as ChatGPT's ability to improve readability.

METHODS

This study involves two phases. First, we evaluated ChatGPT's medical responses for the readability and presentation suitability using 30 knee osteoarthritis (OA)-related questions on March 20, 2023. We applied the Flesch-Kincaid Grade Level (FKGL) and Simple Measure Of the Gobbledygook (SMOG) readability formulas. Additionally, we used three evaluation tools: the Suitability Assessment of Materials (SAM) for presentation scores, the Ensuring Quality Information for Patients (EQIP), and the modified DISCERN (mDISCERN) for overall quality scores. Secondly, we assessed the readability improvement for answers to 50 stroke-related questions by providing both detailed and simple instructions into ChatGPT. In this phase, we also utilized FKGL and SMOG readability tests.

RESULTS

In the readability assessment, the mean (standard deviation, SD) scores for the 30 responses regarding knee OA were as follows: FKGL, 13.65 (1.80) reading grade and SMOG, 15.62 (1.55) reading grade, all of which were statistically higher than the recommended sixth-grade reading level (P < 0.001). In the presentation suitability assessment, SAM score for all answers was 55.00 (10.64), which is considered “adequate.” The mean EQIP and mDISCERN scores were 43.72 (5.78) and 2.83 (0.59), respectively, and none of the responses was evaluated as high quality. Upon implementing both detailed and simple instructions to the 50 responses regarding stroke, the ANOVA test results indicate statistically significant differences in mean readability scores among the three groups: pre-intervention, post-intervention with detailed instructions, and post-intervention with simple instructions (P < 0.001). Post-hoc analysis revealed that the pre-intervention group differed significantly from both post-intervention groups in both readability assessments (P < 0.001, respectively). However, there was no significant difference between the two post-intervention groups (P = 0.96 for FKGL and 0.86 for SMOG).

CONCLUSIONS

This study discovered that ChatGPT responses are hard to read and have low quality, which may discomfort patients, despite their adequate presentation of medical information. Furthermore, ChatGPT lacks the ability to improve medical information's readability. As technology advances, enhancing ChatGPT's readability and user-friendliness will increase its usefulness for patients.

CLINICALTRIAL

Not applicable.

Publisher

JMIR Publications Inc.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3