MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain-Reference-Cited by-同舟云学术

MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain

Published:2021-10-06 Issue:1 Volume:11 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Sharma Dhruv,Purushotham Sanjay,Reddy Chandan K.

Abstract

AbstractMedical images are difficult to comprehend for a person without expertise. The scarcity of medical practitioners across the globe often face the issue of physical and mental fatigue due to the high number of cases, inducing human errors during the diagnosis. In such scenarios, having an additional opinion can be helpful in boosting the confidence of the decision maker. Thus, it becomes crucial to have a reliable visual question answering (VQA) system to provide a ‘second opinion’ on medical cases. However, most of the VQA systems that work today cater to real-world problems and are not specifically tailored for handling medical images. Moreover, the VQA system for medical images needs to consider a limited amount of training data available in this domain. In this paper, we develop MedFuseNet, an attention-based multimodal deep learning model, for VQA on medical images taking the associated challenges into account. Our MedFuseNet aims at maximizing the learning with minimal complexity by breaking the problem statement into simpler tasks and predicting the answer. We tackle two types of answer prediction—categorization and generation. We conducted an extensive set of quantitative and qualitative analyses to evaluate the performance of MedFuseNet. Our experiments demonstrate that MedFuseNet outperforms the state-of-the-art VQA methods, and that visualization of the captured attentions showcases the intepretability of our model’s predicted results.

Publisher

Springer Science and Business Media LLC

Subject

Multidisciplinary

Link

https://www.nature.com/articles/s41598-021-98390-1.pdf

Reference69 articles.

1. World-Health-Organization. Stats and analysis. https://www.who.int/gho/health_workforce/physicians_density/en/ (2019).

2. Bates, D. W. & Gawande, A. A. Error in medicine: what have we learned?. Ann. Internal Med. 132, 763–767 (2000).

3. Moukheibir, N. W. Universal computer assisted diagnosis (2000). US Patent 6,021,404.

4. Havaei, M. et al. Brain tumor segmentation with deep neural networks. Med. Image Anal. 35, 18–31 (2017).

5. Codella, N. C. et al. Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), 168–172 (IEEE, 2018).

Cited by 36 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Surgical-VQLA++: Adversarial contrastive learning for calibrated robust visual question-localized answering in robotic surgery;Information Fusion;2025-01

2. Narrowing the semantic gaps in U-Net with learnable skip connections: The case of medical image segmentation;Neural Networks;2024-10

3. Visual Question Answer System for Skeletal Image Using Radiology Images in the Healthcare Domain Based on Visual and Textual Feature Extraction Techniques;Annals of Data Science;2024-06-29

4. ARDN: Attention Re-distribution Network for Visual Question Answering;Arabian Journal for Science and Engineering;2024-05-01

5. Self-Attention Based Image Feature Representation for Medical Visual Question Answering;2024 IEEE 3rd International Conference on Control, Instrumentation, Energy & Communication (CIEC);2024-01-25