Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World-Reference-Cited by-同舟云学术

Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World

Published:2013-12 Issue: Volume:1 Page:193-206
ISSN:2307-387X
Container-title:Transactions of the Association for Computational Linguistics
language:en
Short-container-title:TACL

Author:

Krishnamurthy Jayant¹,Kollar Thomas¹

Affiliation:

1. Computer Science Department, Carnegie Mellon University,

Abstract

This paper introduces Logical Semantics with Perception (LSP), a model for grounded language acquisition that learns to map natural language statements to their referents in a physical environment. For example, given an image, LSP can map the statement “blue mug on the table” to the set of image segments showing blue mugs on tables. LSP learns physical representations for both categorical (“blue,” “mug”) and relational (“on”) language, and also learns to compose these representations to produce the referents of entire statements. We further introduce a weakly supervised training procedure that estimates LSP’s parameters using annotated referents for entire statements, without annotated referents for individual words or the parse structure of the statement. We perform experiments on two applications: scene understanding and geographical question answering. We find that LSP outperforms existing, less expressive models that cannot represent relational language. We further find that weakly supervised training is competitive with fully supervised training while requiring significantly less annotation effort.

Publisher

MIT Press - Journals

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00220

Reference6 articles.

Cited by 35 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Towards Faithful Model Explanation in NLP: A Survey;Computational Linguistics;2024

2. Scene representation using a new two-branch neural network model;The Visual Computer;2023-12-01

3. Multi-modal interaction with transformers: bridging robots and human with natural language;Robotica;2023-11-13

4. Hierarchical Attention Networks for Fact-based Visual Question Answering;Multimedia Tools and Applications;2023-07-22

5. Scene Understanding for Autonomous Driving Using Visual Question Answering;2023 International Joint Conference on Neural Networks (IJCNN);2023-06-18