Visual Question Answering for Intelligent Interaction

Author:

Gao Panpan1ORCID,Sun Hanxu1,Chen Gang1ORCID,Wang Ruiquan1,Li Minggang1

Affiliation:

1. School of Modern Post (School of Automation), Beijing University of Posts and Telecommunications, Beijing, China

Abstract

With the application of deep learning method in the field of image processing, the image-related intelligent interaction technology has also been rapidly developed. Visual question answering (VQA) collects the image information by asking questions related to the image and ultimately achieves the purpose for enriching the image understanding. Vision and language are the two core parts of human intelligence to understand the real world, and also the basic components to realize artificial intelligence, and a lot of research has been carried out in their respective fields. With the continuous promotion and application of deep learning in the fields of computer vision and natural language processing, visual question answering technology across the visual field and natural language disciplines has become a research hotspot in recent years. Visual question answering (VQA) for intelligent interaction collects image information by asking relevant questions to the content of the image and finally achieves the purpose of enriching image understanding. At the same time, as an emerging research direction, the challenges faced by the visual question answering system are huge, and we need to learn and excavate. Through the comprehensive comparison and analysis of the existing models and methods of visual question answering, this paper summarizes the shortcomings and development directions of the current research work and analyzes several models of visual question answering technology for the processing of image input and question input of the visual question answering model. The working principle of the model and the common public data set of the model: it is concluded that extending the structured knowledge base and applying mature technologies such as text question answering and natural language processing to deal with VQA problems are the future development directions of the VQA model.

Funder

National Natural Science Foundation of China

Publisher

Hindawi Limited

Subject

Computer Networks and Communications,Computer Science Applications

Reference19 articles.

1. Very Deep Convolutional Networks for Large-Scale Image Recognition Computer Vision and Pattern Recognition;K. Simonyan,2014

2. Deep Residual Learning for Image recognition;K. He

3. Long term recurrent convolutional networks forvisual recognition and description;J. Donahue

4. Two Stream Convolutional Networks for Action Recognition in videos;K. Simonyan;Computer Vision and Pattern Recognition,2014

5. You Only Look once: Unified,real-Time Object detection;J. Redmon

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Visual Question Answering in Malayalam Text;2024 3rd International Conference on Sentiment Analysis and Deep Learning (ICSADL);2024-03-13

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3