From Word Types to Tokens and Back: A Survey of Approaches to Word Meaning Representation and Interpretation-Reference-Cited by-同舟云学术

From Word Types to Tokens and Back: A Survey of Approaches to Word Meaning Representation and Interpretation

Published:2023-03-14 Issue: Volume: Page:1-59
ISSN:0891-2017
Container-title:Computational Linguistics
language:en
Short-container-title:

Author:

Apidianaki Marianna¹

Affiliation:

1. University of Pennsylvania, Department of Computer and Information Science. marapi@seas.upenn.edu

Abstract

Abstract Vector-based word representation paradigms situate lexical meaning at different levels of abstraction. Distributional and static embedding models generate a single vector per word type, which is an aggregate across the instances of the word in a corpus. Contextual language models, on the contrary, directly capture the meaning of individual word instances. The goal of this survey is to provide an overview of word meaning representation methods, and of the strategies that have been proposed for improving the quality of the generated vectors. These often involve injecting external knowledge about lexical semantic relationships, or refining the vectors to describe different senses. The survey also covers recent approaches for obtaining word type-level representations from token-level ones, and for combining static and contextualized representations. Special focus is given to probing and interpretation studies aimed at discovering the lexical semantic knowledge that is encoded in contextualized representations. The challenges posed by this exploration have motivated the interest towards static embedding derivation from contextualized embeddings, and for methods aimed at improving the similarity estimates that can be drawn from the space of contextual language models.

Publisher

MIT Press

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics

Link

https://direct.mit.edu/coli/article-pdf/doi/10.1162/coli_a_00474/2074670/coli_a_00474.pdf

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An embedded diachronic sense change model with a case study from ancient Greek;Computational Statistics & Data Analysis;2024-11

2. DG Embeddings: The unsupervised definition embeddings learned from dictionary and glossary to gloss context words of Cloze task;Knowledge-Based Systems;2024-07

3. Predicting Suicidality in Youth Seeking Help from a Crisis Text Line: Development and Validation of an Explainable Transformer-Based Artificial Intelligence (AI) Text Classifier (Preprint);2024-07-01

4. It’s time for a complete theory of partial predictability in language;Theoretical Linguistics;2024-06-01

5. Cooperative Embedding - A Novel Approach to Tackle the Out-Of-Vocabulary Dilemma in Bot Classification;Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing;2024-04-08