Performance of ChatGPT on Nephrology Test Questions-Reference-Cited by-同舟云学术

Performance of ChatGPT on Nephrology Test Questions

Published:2023-10-18 Issue:1 Volume:19 Page:35-43
ISSN:1555-9041
Container-title:Clinical Journal of the American Society of Nephrology
language:en
Short-container-title:CJASN

Author:

Miao Jing^ORCID,Thongprayoon Charat^ORCID,Garcia Valencia Oscar A.^ORCID,Krisanapan Pajaree^ORCID,Sheikh Mohammad S.^ORCID,Davis Paul W.^ORCID,Mekraksakit Poemlarp^ORCID,Suarez Maria Gonzalez^ORCID,Craici Iasmina M.,Cheungpasitporn Wisit^ORCID

Abstract

Background ChatGPT is a novel tool that allows people to engage in conversations with an advanced machine learning model. ChatGPT's performance in the US Medical Licensing Examination is comparable with a successful candidate's performance. However, its performance in the nephrology field remains undetermined. This study assessed ChatGPT's capabilities in answering nephrology test questions. Methods Questions sourced from Nephrology Self-Assessment Program and Kidney Self-Assessment Program were used, each with multiple-choice single-answer questions. Questions containing visual elements were excluded. Each question bank was run twice using GPT-3.5 and GPT-4. Total accuracy rate, defined as the percentage of correct answers obtained by ChatGPT in either the first or second run, and the total concordance, defined as the percentage of identical answers provided by ChatGPT during both runs, regardless of their correctness, were used to assess its performance. Results A comprehensive assessment was conducted on a set of 975 questions, comprising 508 questions from Nephrology Self-Assessment Program and 467 from Kidney Self-Assessment Program. GPT-3.5 resulted in a total accuracy rate of 51%. Notably, the employment of Nephrology Self-Assessment Program yielded a higher accuracy rate compared with Kidney Self-Assessment Program (58% versus 44%; P < 0.001). The total concordance rate across all questions was 78%, with correct answers exhibiting a higher concordance rate (84%) compared with incorrect answers (73%) (P < 0.001). When examining various nephrology subfields, the total accuracy rates were relatively lower in electrolyte and acid-base disorder, glomerular disease, and kidney-related bone and stone disorders. The total accuracy rate of GPT-4's response was 74%, higher than GPT-3.5 (P < 0.001) but remained below the passing threshold and average scores of nephrology examinees (77%). Conclusions ChatGPT exhibited limitations regarding accuracy and repeatability when addressing nephrology-related questions. Variations in performance were evident across various subfields.

Publisher

Ovid Technologies (Wolters Kluwer Health)

Subject

Transplantation,Nephrology,Critical Care and Intensive Care Medicine,Epidemiology

Reference26 articles.

1. Revolutionizing chronic kidney disease management with machine learning and artificial intelligence;Krisanapan;J Clin Med.,2023

2. Promises of Big data and artificial intelligence in nephrology and transplantation;Thongprayoon;J Clin Med.,2020

3. Use of machine learning consensus clustering to identify distinct subtypes of Black kidney transplant recipients and associated outcomes;Thongprayoon;JAMA Surg.,2022

4. The role of ChatGPT, generative language models, and artificial intelligence in medical education: a conversation with ChatGPT and a call for papers;Eysenbach;JMIR Med Educ.,2023

5. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns;Sallam;Healthcare (Basel).,2023

Cited by 25 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Artificial intelligence and machine learning’s role in sepsis-associated acute kidney injury;Kidney Research and Clinical Practice;2024-07-31

2. Transforming Healthcare: The AI Revolution in the Comprehensive Care of Hypertension;Clinics and Practice;2024-07-10

3. STAGER checklist: Standardized testing and assessment guidelines for evaluating generative artificial intelligence reliability;iMetaOmics;2024-07-02

4. Evaluating the performance of ChatGPT-3.5 and ChatGPT-4 on the Taiwan plastic surgery board examination;Heliyon;2024-07

5. The potential of ChatGPT in medicine: an example analysis of nephrology specialty exams in Poland;Clinical Kidney Journal;2024-06-22