Imagined Visual Representations as Multimodal Embeddings-Reference-Cited by-同舟云学术

Imagined Visual Representations as Multimodal Embeddings

Published:2017-02-12 Issue:1 Volume:31 Page:
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Collell Guillem,Zhang Ted,Moens Marie-Francine

Abstract

Language and vision provide complementary information. Integrating both modalities in a single multimodal representation is an unsolved problem with wide-reaching applications to both natural language processing and computer vision. In this paper, we present a simple and effective method that learns a language-to-vision mapping and uses its output visual predictions to build multimodal representations. In this sense, our method provides a cognitively plausible way of building representations, consistent with the inherently re-constructive and associative nature of human memory. Using seven benchmark concept similarity tests we show that the mapped (or imagined) vectors not only help to fuse multimodal information, but also outperform strong unimodal baselines and state-of-the-art multimodal methods, thus exhibiting more human-like judgments. Ultimately, the present work sheds light on fundamental questions of natural language understanding concerning the fusion of vision and language such as the plausibility of more associative and re-constructive approaches.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 15 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MuSAM: Mutual-Scenario-Aware Multimodal-Enhanced Representation Learning for Semantic Similarity;IEEE Transactions on Industrial Informatics;2024-09

2. How direct is the link between words and images?;The Mental Lexicon;2024-01-11

3. Semantics Fusion of Hierarchical Transformers for Multimodal Named Entity Recognition;Lecture Notes in Computer Science;2024

4. Language with vision: A study on grounded word and sentence embeddings;Behavior Research Methods;2023-12-19

5. Self-supervised Multimodal Representation Learning for Product Identification and Retrieval;Communications in Computer and Information Science;2023-11-27