Using Syntax to Ground Referring Expressions in Natural Images-Reference-Cited by-同舟云学术

Using Syntax to Ground Referring Expressions in Natural Images

Published:2018-04-27 Issue:1 Volume:32 Page:
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Cirik Volkan,Berg-Kirkpatrick Taylor,Morency Louis-Philippe

Abstract

We introduce GroundNet, a neural network for referring expression recognition---the task of localizing (or grounding) in an image the object referred to by a natural language expression. Our approach to this task is the first to rely on a syntactic analysis of the input referring expression in order to inform the structure of the computation graph. Given a parse tree for an input expression, we explicitly map the syntactic constituents and relationships present in the tree to a composed graph of neural modules that defines our architecture for performing localization. This syntax-based approach aids localization of both the target object and auxiliary supporting objects mentioned in the expression. As a result, GroundNet is more interpretable than previous methods: we can (1) determine which phrase of the referring expression points to which object in the image and (2) track how the localization of the target object is determined by the network. We study this property empirically by introducing a new set of annotations on the GoogleRef dataset to evaluate localization of supporting objects. Our experiments show that GroundNet achieves state-of-the-art accuracy in identifying supporting objects, while maintaining comparable performance in the localization of target objects.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Foundations & Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions;ACM Computing Surveys;2024-06-22

2. Video captioning – a survey;Multimedia Tools and Applications;2024-04-09

3. Construction grammar and procedural semantics for human-interpretable grounded language processing;Linguistics Vanguard;2024-03-15

4. Room-Object Entity Prompting and Reasoning for Embodied Referring Expression;IEEE Transactions on Pattern Analysis and Machine Intelligence;2024-02

5. Neural module networks: A review;Neurocomputing;2023-10