Word Embeddings as Metric Recovery in Semantic Spaces-Reference-Cited by-同舟云学术

Word Embeddings as Metric Recovery in Semantic Spaces

Published:2016-12 Issue: Volume:4 Page:273-286
ISSN:2307-387X
Container-title:Transactions of the Association for Computational Linguistics
language:en
Short-container-title:TACL

Author:

Hashimoto Tatsunori B.¹,Alvarez-Melis David¹,Jaakkola Tommi S.¹

Affiliation:

1. CSAIL, Massachusetts Institute of Technology,

Abstract

Continuous word representations have been remarkably useful across NLP tasks but remain poorly understood. We ground word embeddings in semantic spaces studied in the cognitive-psychometric literature, taking these spaces as the primary objects to recover. To this end, we relate log co-occurrences of words in large corpora to semantic similarity assessments and show that co-occurrences are indeed consistent with an Euclidean semantic space hypothesis. Framing word embedding as metric recovery of a semantic space unifies existing word embedding algorithms, ties them to manifold learning, and demonstrates that existing algorithms are consistent metric recovery methods given co-occurrence counts from random walks. Furthermore, we propose a simple, principled, direct metric recovery algorithm that performs on par with the state-of-the-art word embedding and manifold learning methods. Finally, we complement recent focus on analogies by constructing two new inductive reasoning datasets—series completion and classification—and demonstrate that word embeddings can be used to solve them as well.

Publisher

MIT Press - Journals

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Human-Computer Interaction,Communication

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00098

Reference14 articles.

1. Local Limit Theorems for Sequences of Simple Random Walks on Graphs

2. Distributional Structure

3. Improving Distributional Similarity with Lessons Learned from Word Embeddings

4. DIFFUSION PROCESSES AND RIEMANNIAN GEOMETRY

Cited by 31 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Exploring Federated Learning Tendencies Using a Semantic Keyword Clustering Approach;Information;2024-06-28

2. Unsupervised embedding of trajectories captures the latent structure of scientific migration;Proceedings of the National Academy of Sciences;2023-12-22

3. Two Faces of Novelty: Idea Selection in Crowdsourcing Challenges;SSRN Electronic Journal;2023

4. Distributional Word Vectors as Semantic Maps Framework;Computación y Sistemas;2022-09-05

5. ICAF: Iterative Contrastive Alignment Framework for Multimodal Abstractive Summarization;2022 International Joint Conference on Neural Networks (IJCNN);2022-07-18