Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study-Reference-Cited by-同舟云学术

Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study

Published:2023-02-15 Issue:4 Volume:20 Page:3378
ISSN:1660-4601
Container-title:International Journal of Environmental Research and Public Health
language:en
Short-container-title:IJERPH

Author:

Hirosawa Takanobu¹^ORCID,Harada Yukinori¹^ORCID,Yokose Masashi¹^ORCID,Sakamoto Tetsu¹,Kawamura Ren¹,Shimizu Taro¹

Affiliation:

1. Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Tochigi 321-0293, Japan

Abstract

The diagnostic accuracy of differential diagnoses generated by artificial intelligence (AI) chatbots, including the generative pretrained transformer 3 (GPT-3) chatbot (ChatGPT-3) is unknown. This study evaluated the accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical vignettes with common chief complaints. General internal medicine physicians created clinical cases, correct diagnoses, and five differential diagnoses for ten common chief complaints. The rate of correct diagnosis by ChatGPT-3 within the ten differential-diagnosis lists was 28/30 (93.3%). The rate of correct diagnosis by physicians was still superior to that by ChatGPT-3 within the five differential-diagnosis lists (98.3% vs. 83.3%, p = 0.03). The rate of correct diagnosis by physicians was also superior to that by ChatGPT-3 in the top diagnosis (53.3% vs. 93.3%, p < 0.001). The rate of consistent differential diagnoses among physicians within the ten differential-diagnosis lists generated by ChatGPT-3 was 62/88 (70.5%). In summary, this study demonstrates the high diagnostic accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical cases with common chief complaints. This suggests that AI chatbots such as ChatGPT-3 can generate a well-differentiated diagnosis list for common chief complaints. However, the order of these lists can be improved in the future.

Publisher

MDPI AG

Subject

Health, Toxicology and Mutagenesis,Public Health, Environmental and Occupational Health

Link

https://www.mdpi.com/1660-4601/20/4/3378/pdf

Reference38 articles.

1. Zhou, B., Yang, G., Shi, Z., and Ma, S. (2021). Natural language processing for smart healthcare. arXiv.

2. Decoding Artificial Intelligence to Achieve Diagnostic Excellence: Learning from Experts, Examples, and Experience: Learning from Experts, Examples, and Experience;Chen;JAMA,2022

3. A Review of AI Based Medical Assistant Chatbot;Bulla;Res. Appl. Web Dev. Des.,2020

4. New Meaning for NLP: The Trials and Tribulations of Natural Language Processing with GPT-3 in Ophthalmology;Nath;Br. J. Ophthalmol.,2022

5. Considering the Possibilities and Pitfalls of Generative Pre-Trained Transformer 3 (GPT-3) in Healthcare Delivery;Korngiebel;NPJ Digit. Med.,2021

Cited by 202 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Comparative evaluation of artificial intelligence systems' accuracy in providing medical drug dosages: A methodological study;World Journal of Methodology;2024-12-20

2. Hepatic encephalopathy post-TIPS: Current status and prospects in predictive assessment;Computational and Structural Biotechnology Journal;2024-12

3. ChatGPT and neurosurgical education: A crossroads of innovation and opportunity;Journal of Clinical Neuroscience;2024-11

4. Generative artificial intelligence versus clinicians: Who diagnoses multiple sclerosis faster and with greater accuracy?;Multiple Sclerosis and Related Disorders;2024-10

5. Analysis of Responses of GPT-4 V to the Japanese National Clinical Engineer Licensing Examination;Journal of Medical Systems;2024-09-11