Assessing the Performance of Zero-Shot Visual Question Answering in Multimodal Large Language Models for 12-Lead ECG Image Interpretation-Reference-Cited by-同舟云学术

Assessing the Performance of Zero-Shot Visual Question Answering in Multimodal Large Language Models for 12-Lead ECG Image Interpretation

Published:2024-03-22 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Seki Tomohisa^ORCID,Kawazoe Yoshimasa^ORCID,Akagi Yu^ORCID,Takiguchi Toru^ORCID,Ohe Kazuhiko^ORCID

Abstract

AbstractLarge Language Models (LLM) are increasingly multimodal, and Zero-Shot Visual Question Answering (VQA) shows promise for image interpretation. If zero-shot VQA can be applied to a 12-lead electrocardiogram (ECG), a prevalent diagnostic tool in the medical field, the potential benefits to the field would be substantial. This study evaluated the diagnostic performance of zero-shot VQA with multimodal LLMs on 12-lead ECG images. The results revealed that multimodal LLM tended to make more errors in extracting and verbalizing image features than in describing preconditions and making logical inferences. Even when the answers were correct, erroneous descriptions of image features were common. These findings suggest a need for improved control over image hallucination and indicate that performance evaluation using the percentage of correct answers to multiple-choice questions may not be sufficient for performance assessment in VQA tasks.

Publisher

Cold Spring Harbor Laboratory

Reference23 articles.

1. A brief review: history to understand fundamentals of electrocardiography;J Community Hosp Intern Med Perspect,2012

2. Testing the performance of ECG computer programs: the CSE diagnostic pilot study;J Electrocardiol,1987

3. Common standards for quantitative electrocardiography: goals and main results;Methods Inf Med,1990

4. Computer-Interpreted Electrocardiograms

5. Automatic diagnosis of the 12-lead ECG using a deep neural network;Nature Comms,2020