Author:
Liu Shengyan,Zhang Xuejie,Zhou Xiaobing,Yang Jian
Abstract
AbstractBackgroundVisual question answering in medical domain (VQA-Med) exhibits great potential for enhancing confidence in diagnosing diseases and helping patients better understand their medical conditions. One of the challenges in VQA-Med is how to better understand and combine the semantic features of medical images (e.g., X-rays, Magnetic Resonance Imaging(MRI)) and answer the corresponding questions accurately in unlabeled medical datasets.MethodWe propose a novel Bi-branched model based on Parallel networks and Image retrieval for Medical Visual Question Answering (BPI-MVQA). The first branch of BPI-MVQA is a transformer structure based on a parallel network to achieve complementary advantages in image sequence feature and spatial feature extraction, and multi-modal features are implicitly fused by using the multi-head self-attention mechanism. The second branch is retrieving the similarity of image features generated by the VGG16 network to obtain similar text descriptions as labels.ResultThe BPI-MVQA model achieves state-of-the-art results on three VQA-Med datasets, and the main metric scores exceed the best results so far by 0.2$$\%$$%, 1.4$$\%$$%, and 1.1$$\%$$%.ConclusionThe evaluation results support the effectiveness of the BPI-MVQA model in VQA-Med. The design of the bi-branch structure helps the model answer different types of visual questions. The parallel network allows for multi-angle image feature extraction, a unique feature extraction method that helps the model better understand the semantic information of the image and achieve greater accuracy in the multi-classification of VQA-Med. In addition, image retrieval helps the model answer irregular, open-ended type questions from the perspective of understanding the information provided by images. The comparison of our method with state-of-the-art methods on three datasets also shows that our method can bring substantial improvement to the VQA-Med system.
Publisher
Springer Science and Business Media LLC
Subject
Radiology, Nuclear Medicine and imaging
Reference55 articles.
1. Weston J, Bordes A, Chopra S, Rush AM, van Merriënboer B, Joulin A, Mikolov T. Towards ai-complete question answering: A set of prerequisite toy tasks. 2015. arXiv preprint arXiv:1502.05698.
2. Hii P-C, Chung W-Y. A comprehensive ubiquitous healthcare solution on an android mobile device. Sensors. 2011;11(7):6799–815.
3. Cao Y, Liu F, Simpson P, Antieau L, Bennett A, Cimino JJ, Ely J, Hong Yu. Askhermes: an online question answering system for complex clinical questions. J Biomed Inform. 2011;44(2):277–88.
4. Paramasivam A, Jaya NS. A survey on textual entailment based question answering. J King Saud Univ-Comput Inform Sci. 2021.
5. Izcovich A, Criniti JM, Ruiz JI, Catalano HN. Impact of a grade-based medical question answering system on physician behaviour: a randomised controlled trial. BMJ Evid-Based Med. 2015;20(3):81–7.
Cited by
16 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献