Syntactic Coherence in Word Embedding Spaces-Reference-Cited by-同舟云学术

Syntactic Coherence in Word Embedding Spaces

Published:2021-06 Issue:02 Volume:15 Page:263-290
ISSN:1793-351X
Container-title:International Journal of Semantic Computing
language:en
Short-container-title:Int. J. Semantic Computing

Author:

Ravindran Renjith P.¹,Murthy Kavi Narayana¹

Affiliation:

1. School of Computer and Information Sciences, University of Hyderabad, Hyderabad, India

Abstract

Word embeddings have recently become a vital part of many Natural Language Processing (NLP) systems. Word embeddings are a suite of techniques that represent words in a language as vectors in an n-dimensional real space that has been shown to encode a significant amount of syntactic and semantic information. When used in NLP systems, these representations have resulted in improved performance across a wide range of NLP tasks. However, it is not clear how syntactic properties interact with the more widely studied semantic properties of words. Or what the main factors in the modeling formulation are that encourages embedding spaces to pick up more of syntactic behavior as opposed to semantic behavior of words. We investigate several aspects of word embedding spaces and modeling assumptions that maximize syntactic coherence — the degree to which words with similar syntactic properties form distinct neighborhoods in the embedding space. We do so in order to understand which of the existing models maximize syntactic coherence making it a more reliable source for extracting syntactic category (POS) information. Our analysis shows that syntactic coherence of S-CODE is superior to the other more popular and more recent embedding techniques such as Word2vec, fastText, GloVe and LexVec, when measured under compatible parameter settings. Our investigation also gives deeper insights into the geometry of the embedding space with respect to syntactic coherence, and how this is influenced by context size, frequency of words, and dimensionality of the embedding space.

Publisher

World Scientific Pub Co Pte Lt

Subject

Artificial Intelligence,Computer Networks and Communications,Computer Science Applications,Linguistics and Language,Information Systems,Software

Link

https://www.worldscientific.com/doi/pdf/10.1142/S1793351X21500057

Reference60 articles.

1. A study on similarity and relatedness using distributional and WordNet-based approaches

2. A Latent Variable Model Approach to PMI-based Word Embeddings

3. Strudel: A Corpus-Based Semantic Model Based on Properties and Types