Validation of a Deep Learning Chest X-ray Interpretation Model: Integrating Large-Scale AI and Large Language Models for Comparative Analysis with ChatGPT

Author:

Lee Kyu Hong1ORCID,Lee Ro Woon1ORCID,Kwon Ye Eun1

Affiliation:

1. Department of Radiology, College of Medicine, Inha University, Incheon 22212, Republic of Korea

Abstract

This study evaluates the diagnostic accuracy and clinical utility of two artificial intelligence (AI) techniques: Kakao Brain Artificial Neural Network for Chest X-ray Reading (KARA-CXR), an assistive technology developed using large-scale AI and large language models (LLMs), and ChatGPT, a well-known LLM. The study was conducted to validate the performance of the two technologies in chest X-ray reading and explore their potential applications in the medical imaging diagnosis domain. The study methodology consisted of randomly selecting 2000 chest X-ray images from a single institution’s patient database, and two radiologists evaluated the readings provided by KARA-CXR and ChatGPT. The study used five qualitative factors to evaluate the readings generated by each model: accuracy, false findings, location inaccuracies, count inaccuracies, and hallucinations. Statistical analysis showed that KARA-CXR achieved significantly higher diagnostic accuracy compared to ChatGPT. In the ‘Acceptable’ accuracy category, KARA-CXR was rated at 70.50% and 68.00% by two observers, while ChatGPT achieved 40.50% and 47.00%. Interobserver agreement was moderate for both systems, with KARA at 0.74 and GPT4 at 0.73. For ‘False Findings’, KARA-CXR scored 68.00% and 68.50%, while ChatGPT scored 37.00% for both observers, with high interobserver agreements of 0.96 for KARA and 0.97 for GPT4. In ‘Location Inaccuracy’ and ‘Hallucinations’, KARA-CXR outperformed ChatGPT with significant margins. KARA-CXR demonstrated a non-hallucination rate of 75%, which is significantly higher than ChatGPT’s 38%. The interobserver agreement was high for KARA (0.91) and moderate to high for GPT4 (0.85) in the hallucination category. In conclusion, this study demonstrates the potential of AI and large-scale language models in medical imaging and diagnostics. It also shows that in the chest X-ray domain, KARA-CXR has relatively higher accuracy than ChatGPT.

Funder

KakaoBrain

Publisher

MDPI AG

Subject

Clinical Biochemistry

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3