A comparative evaluation of ChatGPT 3.5 and ChatGPT 4 in responses to selected genetics questions-Reference-Cited by-同舟云学术

A comparative evaluation of ChatGPT 3.5 and ChatGPT 4 in responses to selected genetics questions

Published:2024-06-14 Issue: Volume: Page:
ISSN:1067-5027
Container-title:Journal of the American Medical Informatics Association
language:en
Short-container-title:

Author:

McGrath Scott P¹^ORCID,Kozel Beth A²^ORCID,Gracefo Sara³,Sutherland Nykole³,Danford Christopher J⁴,Walton Nephi⁵

Affiliation:

1. CITRIS Health, University of California Berkeley , Berkeley, CA 94720-1764, United States

2. Laboratory of Vascular and Matrix Genetics, National Heart, Lung, and Blood Institute (NHLBI) , Bethesda, MD 20892, United States

3. Intermountain Precision Genomics, Intermountain Healthcare , St George, UT 84790-8723, United States

4. Transplant Services, Intermountain Medical Center Murray, UT 84107, United States

5. National Human Genome Research Institute, National Institute of Health , Bethesda, MD 20892-2152, United States

Abstract

Abstract Objectives To evaluate the efficacy of ChatGPT 4 (GPT-4) in delivering genetic information about BRCA1, HFE, and MLH1, building on previous findings with ChatGPT 3.5 (GPT-3.5). To focus on assessing the utility, limitations, and ethical implications of using ChatGPT in medical settings. Materials and Methods A structured survey was developed to assess GPT-4’s clinical value. An expert panel of genetic counselors and clinical geneticists evaluated GPT-4’s responses to these questions. We also performed comparative analysis with GPT-3.5, utilizing descriptive statistics and using Prism 9 for data analysis. Results The findings indicate improved accuracy in GPT-4 over GPT-3.5 (P < .0001). However, notable errors in accuracy remained. The relevance of responses varied in GPT-4, but was generally favorable, with a mean in the “somewhat agree” range. There was no difference in performance by disease category. The 7-question subset of the Bot Usability Scale (BUS-15) showed no statistically significant difference between the groups but trended lower in the GPT-4 version. Discussion and Conclusion The study underscores GPT-4’s potential role in genetic education, showing notable progress yet facing challenges like outdated information and the necessity of ongoing refinement. Our results, while showing promise, emphasizes the importance of balancing technological innovation with ethical responsibility in healthcare information delivery.

Funder

NIH

Publisher

Oxford University Press (OUP)

Link

https://academic.oup.com/jamia/advance-article-pdf/doi/10.1093/jamia/ocae128/58231182/ocae128.pdf

Reference60 articles.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Diagnostic Accuracy of a Custom Large Language Model on Rare Pediatric Disease Case Reports;American Journal of Medical Genetics Part A;2024-09-13

2. Large Language Models to Help Appeal Denied Radiotherapy Services;JCO Clinical Cancer Informatics;2024-09

3. Chatbot for the Return of Positive Genetic Screening Results for Hereditary Cancer Syndromes: a Prompt Engineering Study;2024-08-29

4. Chatbot for the Return of Positive Genetic Screening Results for Hereditary Cancer Syndromes: a Prompt Engineering Study (Preprint);2024-08-27