Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods-Reference-Cited by-同舟云学术

Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods

Published:2021-08-30 Issue: Volume:71 Page:1183-1317
ISSN:1076-9757
Container-title:Journal of Artificial Intelligence Research
language:
Short-container-title:jair

Author:

Mogadala Aditya,Kalimuthu Marimuthu,Klakow Dietrich

Abstract

Interest in Artificial Intelligence (AI) and its applications has seen unprecedented growth in the last few years. This success can be partly attributed to the advancements made in the sub-fields of AI such as machine learning, computer vision, and natural language processing. Much of the growth in these fields has been made possible with deep learning, a sub-area of machine learning that uses artificial neural networks. This has created significant interest in the integration of vision and language. In this survey, we focus on ten prominent tasks that integrate language and vision by discussing their problem formulation, methods, existing datasets, evaluation measures, and compare the results obtained with corresponding state-of-the-art methods. Our efforts go beyond earlier surveys which are either task-specific or concentrate only on one type of visual content, i.e., image or video. Furthermore, we also provide some potential future directions in this field of research with an anticipation that this survey stimulates innovative thoughts and ideas to address the existing challenges and build new applications.

Publisher

AI Access Foundation

Subject

Artificial Intelligence

Cited by 44 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A survey on knowledge-enhanced multimodal learning;Artificial Intelligence Review;2024-09-09

2. Heterogeneous Contrastive Learning for Foundation Models and Beyond;Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining;2024-08-24

3. Robust Visual Question Answering: Datasets, Methods, and Future Challenges;IEEE Transactions on Pattern Analysis and Machine Intelligence;2024-08

4. Open-Vocabulary Part-Level Detection and Segmentation for Human–Robot Interaction;Applied Sciences;2024-07-21

5. Transparent and trustworthy interpretation of COVID-19 features in chest X-rays using explainable AI;Multimedia Tools and Applications;2024-07-17