Grounded Compositional Semantics for Finding and Describing Images with Sentences-Reference-Cited by-同舟云学术

Grounded Compositional Semantics for Finding and Describing Images with Sentences

Published:2014-12 Issue: Volume:2 Page:207-218
ISSN:2307-387X
Container-title:Transactions of the Association for Computational Linguistics
language:en
Short-container-title:TACL

Author:

Socher Richard¹,Karpathy Andrej¹,Le Quoc V.²,Manning Christopher D.¹,Ng Andrew Y.¹

Affiliation:

1. Stanford University, Computer Science Department,

2. Google Inc.,

Abstract

Previous work on Recursive Neural Networks (RNNs) shows that these models can produce compositional feature vectors for accurately representing and classifying sentences or images. However, the sentence vectors of previous models cannot accurately represent visually grounded meaning. We introduce the DT-RNN model which uses dependency trees to embed sentences into a vector space in order to retrieve images that are described by those sentences. Unlike previous RNN-based models which use constituency trees, DT-RNNs naturally focus on the action and agents in a sentence. They are better able to abstract from the details of word order and syntactic expression. DT-RNNs outperform other recursive and recurrent neural networks, kernelized CCA and a bag-of-words baseline on the tasks of finding an image that fits a sentence description and vice versa. They also give more similar representations to sentences that describe the same image.

Publisher

MIT Press - Journals

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Human-Computer Interaction,Communication

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00177

Reference5 articles.

1. Distributional Memory: A General Framework for Corpus-Based Semantics

2. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics

3. Composition in Distributional Models of Semantics

4. From Frequency to Meaning: Vector Space Models of Semantics

Cited by 266 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. AERMNet: Attention-enhanced relational memory network for medical image report generation;Computer Methods and Programs in Biomedicine;2024-02

2. Context‐aware relation enhancement and similarity reasoning for image‐text retrieval;IET Computer Vision;2024-01-30

3. A survey on multimodal bidirectional machine learning translation of image and natural language processing;Expert Systems with Applications;2024-01

4. Content-based Search for Deep Generative Models;SIGGRAPH Asia 2023 Conference Papers;2023-12-10

5. Alignment efficient image-sentence retrieval considering transferable cross-modal representation learning;Frontiers of Computer Science;2023-12-02