Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care-Reference-Cited by-同舟云学术

Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care

Published:2024-08-16 Issue:33 Volume:103 Page:e39305
ISSN:1536-5964
Container-title:Medicine
language:en
Short-container-title:

Author:

Hancı Volkan¹^ORCID,Ergün Bişar²,Gül Şanser³,Uzun Özcan⁴,Erdemir İsmail⁵,Hancı Ferid Baran⁶

Affiliation:

1. Clinic of Anesthesiology and Critical Care, Sincan Education and Research Hospital, Ankara, Turkey

2. Clinic of Internal Medicine and Critical Care, Dr. Ismail Fehmi Cumalioğlu City Hospital, Tekirdağ, Turkey

3. Clinic of Neurosurgery, Ankara Ataturk Sanatory Education and Research Hospital, Ankara, Turkey

4. Clinic of Internal Medicine and Nephrology, Yalova City Hospital, Yalova, Turkey

5. Department of Anesthesiology and Critical Care, Faculty of Medicine, Dokuz Eylül University, Izmir, Turkey

6. Artificial Intelligence Engineering Department, Faculty of Engineering, Ostim Technical University, Ankara, Turkey.

Abstract

There is no study that comprehensively evaluates data on the readability and quality of “palliative care” information provided by artificial intelligence (AI) chatbots ChatGPT®, Bard®, Gemini®, Copilot®, Perplexity®. Our study is an observational and cross-sectional original research study. In our study, AI chatbots ChatGPT®, Bard®, Gemini®, Copilot®, and Perplexity® were asked to present the answers of the 100 questions most frequently asked by patients about palliative care. Responses from each 5 AI chatbots were analyzed separately. This study did not involve any human participants. Study results revealed significant differences between the readability assessments of responses from all 5 AI chatbots (P < .05). According to the results of our study, when different readability indexes were evaluated holistically, the readability of AI chatbot responses was evaluated as Bard®, Copilot®, Perplexity®, ChatGPT®, Gemini®, from easy to difficult (P < .05). In our study, the median readability indexes of each of the 5 AI chatbots Bard®, Copilot®, Perplexity®, ChatGPT®, Gemini® responses were compared to the “recommended” 6th grade reading level. According to the results of our study answers of all 5 AI chatbots were compared with the 6th grade reading level, statistically significant differences were observed in the all formulas (P < .001). The answers of all 5 artificial intelligence robots were determined to be at an educational level well above the 6th grade level. The modified DISCERN and Journal of American Medical Association scores was found to be the highest in Perplexity® (P < .001). Gemini® responses were found to have the highest Global Quality Scale score (P < .001). It is emphasized that patient education materials should have a readability level of 6th grade level. Of the 5 AI chatbots whose answers about palliative care were evaluated, Bard®, Copilot®, Perplexity®, ChatGPT®, Gemini®, their current answers were found to be well above the recommended levels in terms of readability of text content. Text content quality assessment scores are also low. Both the quality and readability of texts should be brought to appropriate recommended limits.

Publisher

Ovid Technologies (Wolters Kluwer Health)

Reference45 articles.

1. How artificial intelligence can provide information about subdural hematoma: assessment of readability, reliability, and quality of ChatGPT, BARD, and perplexity responses.;Gül;Medicine (Baltimore),2024

2. Empowering patients: promoting patient education and health literacy.;Bhattad;Cureus,2022

3. Who can help me? Understanding the antecedent and consequence of medical information seeking behavior in the era of bigdata.;Sun;Front Public Health,2023

4. High-performance medicine: the convergence of human and artificial intelligence.;Topol;Nat Med,2019

5. Artificial intelligence methods and artificial intelligence-enabled metrics for surgical education: a multidisciplinary consensus.;Vedula;J Am Coll Surg,2022