Joint embedding VQA model based on dynamic word vector-Reference-Cited by-同舟云学术

Joint embedding VQA model based on dynamic word vector

Published:2021-03-03 Issue: Volume:7 Page:e353
ISSN:2376-5992
Container-title:PeerJ Computer Science
language:en
Short-container-title:

Author:

Ma Zhiyang¹,Zheng Wenfeng¹^ORCID,Chen Xiaobing¹^ORCID,Yin Lirong²

Affiliation:

1. School of Automation, University of Electronic Science and Technology of China, Chengdu, P. R. China

2. Department of Geography and Anthropology, Louisiana State University, LA, USA

Abstract

The existing joint embedding Visual Question Answering models use different combinations of image characterization, text characterization and feature fusion method, but all the existing models use static word vectors for text characterization. However, in the real language environment, the same word may represent different meanings in different contexts, and may also be used as different grammatical components. These differences cannot be effectively expressed by static word vectors, so there may be semantic and grammatical deviations. In order to solve this problem, our article constructs a joint embedding model based on dynamic word vector—none KB-Specific network (N-KBSN) model which is different from commonly used Visual Question Answering models based on static word vectors. The N-KBSN model consists of three main parts: question text and image feature extraction module, self attention and guided attention module, feature fusion and classifier module. Among them, the key parts of N-KBSN model are: image characterization based on Faster R-CNN, text characterization based on ELMo and feature enhancement based on multi-head attention mechanism. The experimental results show that the N-KBSN constructed in our experiment is better than the other 2017—winner (glove) model and 2019—winner (glove) model. The introduction of dynamic word vector improves the accuracy of the overall results.

Funder

Sichuan Science and Technology Program

Fundamental Research Funds for the Central Universities

Publisher

PeerJ

Subject

General Computer Science

Link

https://peerj.com/articles/cs-353.pdf

Reference41 articles.

1. VQA-Med: Overview of the Medical Visual Question Answering Task at ImageCLEF 2019;Abacha,2019

2. Bottom-up and top-down attention for image captioning and visual question answering;Anderson,2018

3. Vqa: visual question answering;Antol,2015

4. The text-based adventure AI competition;Atkinson;IEEE Transactions on Games,2019

5. Abc-cnn: an attention based convolutional neural network for visual question answering;Chen;arXiv,2015

Cited by 135 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Design of knowledge incorporated VQA based on spatial GCNN with structured sentence embedding and linking algorithm;Journal of Intelligent & Fuzzy Systems;2023-12-02

2. Combined Channel and Spatial Attention-Based Stereo Endoscopic Image Super-Resolution;TENCON 2023 - 2023 IEEE Region 10 Conference (TENCON);2023-10-31

3. Task-Based Visual Attention for Continually Improving the Performance of Autonomous Game Agents;Electronics;2023-10-25

4. Knowledge-driven intelligent recommendation method for emergency plans in water diversion projects;Journal of Hydroinformatics;2023-10-25

5. Design of a Modified Transformer Architecture Based on Relative Position Coding;International Journal of Computational Intelligence Systems;2023-10-23