Affiliation:
1. Department of Pathology, SUNY (State University of New York) Downstate Medical Center, Brooklyn
2. Lake Erie College of Osteopathic Medicine, Elmira, New York
3. Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York
Abstract
ImportanceAnatomic pathology reports are an essential part of health care, containing vital diagnostic and prognostic information. Currently, most patients have access to their test results online. However, the reports are complex and are generally incomprehensible to laypeople. Artificial intelligence chatbots could potentially simplify pathology reports.ObjectiveTo evaluate the ability of large language model chatbots to accurately explain pathology reports to patients.Design, Setting, and ParticipantsThis cross-sectional study used 1134 pathology reports from January 1, 2018, to May 31, 2023, from a multispecialty hospital in Brooklyn, New York. A new chat was started for each report, and both chatbots (Bard [Google Inc], hereinafter chatbot 1; GPT-4 [OpenAI], hereinafter chatbot 2) were asked in sequential prompts to explain the reports in simple terms and identify key information. Chatbot responses were generated between June 1 and August 31, 2023. The mean readability scores of the original and simplified reports were compared. Two reviewers independently screened and flagged reports with potential errors. Three pathologists reviewed the flagged reports and categorized them as medically correct, partially medically correct, or medically incorrect; they also recorded any instances of hallucinations.Main Outcomes and MeasuresOutcomes included improved mean readability scores and a medically accurate interpretation.ResultsFor the 1134 reports included, the Flesch-Kincaid grade level decreased from a mean of 13.19 (95% CI, 12.98-13.41) to 8.17 (95% CI, 8.08-8.25; t = 45.29; P < .001) by chatbot 1 and 7.45 (95% CI, 7.35-7.54; t = 49.69; P < .001) by chatbot 2. The Flesch Reading Ease score was increased from a mean of 10.32 (95% CI, 8.69-11.96) to 61.32 (95% CI, 60.80-61.84; t = −63.19; P < .001) by chatbot 1 and 70.80 (95% CI, 70.32-71.28; t = −74.61; P < .001) by chatbot 2. Chatbot 1 interpreted 993 reports (87.57%) correctly, 102 (8.99%) partially correctly, and 39 (3.44%) incorrectly; chatbot 2 interpreted 1105 reports (97.44%) correctly, 24 (2.12%) partially correctly, and 5 (0.44%) incorrectly. Chatbot 1 had 32 instances of hallucinations (2.82%), while chatbot 2 had 3 (0.26%).Conclusions and RelevanceThe findings of this cross-sectional study suggest that artificial intelligence chatbots were able to simplify pathology reports. However, some inaccuracies and hallucinations occurred. Simplified reports should be reviewed by clinicians before distribution to patients.
Publisher
American Medical Association (AMA)