Visual Question Answering for Intelligent Interaction-Reference-Cited by-同舟云学术

Visual Question Answering for Intelligent Interaction

Published:2022-07-06 Issue: Volume:2022 Page:1-6
ISSN:1875-905X
Container-title:Mobile Information Systems
language:en
Short-container-title:Mobile Information Systems

Author:

Gao Panpan¹^ORCID,Sun Hanxu¹,Chen Gang¹^ORCID,Wang Ruiquan¹,Li Minggang¹

Affiliation:

1. School of Modern Post (School of Automation), Beijing University of Posts and Telecommunications, Beijing, China

Abstract

With the application of deep learning method in the field of image processing, the image-related intelligent interaction technology has also been rapidly developed. Visual question answering (VQA) collects the image information by asking questions related to the image and ultimately achieves the purpose for enriching the image understanding. Vision and language are the two core parts of human intelligence to understand the real world, and also the basic components to realize artificial intelligence, and a lot of research has been carried out in their respective fields. With the continuous promotion and application of deep learning in the fields of computer vision and natural language processing, visual question answering technology across the visual field and natural language disciplines has become a research hotspot in recent years. Visual question answering (VQA) for intelligent interaction collects image information by asking relevant questions to the content of the image and finally achieves the purpose of enriching image understanding. At the same time, as an emerging research direction, the challenges faced by the visual question answering system are huge, and we need to learn and excavate. Through the comprehensive comparison and analysis of the existing models and methods of visual question answering, this paper summarizes the shortcomings and development directions of the current research work and analyzes several models of visual question answering technology for the processing of image input and question input of the visual question answering model. The working principle of the model and the common public data set of the model: it is concluded that extending the structured knowledge base and applying mature technologies such as text question answering and natural language processing to deal with VQA problems are the future development directions of the VQA model.

Funder

National Natural Science Foundation of China

Publisher

Hindawi Limited

Subject

Computer Networks and Communications,Computer Science Applications

Link

http://downloads.hindawi.com/journals/misy/2022/4232968.pdf

Reference19 articles.

1. Very Deep Convolutional Networks for Large-Scale Image Recognition Computer Vision and Pattern Recognition;K. Simonyan,2014

2. Deep Residual Learning for Image recognition;K. He

3. Long term recurrent convolutional networks forvisual recognition and description;J. Donahue

4. Two Stream Convolutional Networks for Action Recognition in videos;K. Simonyan;Computer Vision and Pattern Recognition,2014

5. You Only Look once: Unified,real-Time Object detection;J. Redmon

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Visual Question Answering in Malayalam Text;2024 3rd International Conference on Sentiment Analysis and Deep Learning (ICSADL);2024-03-13