Unseen Word Representation by Aligning Heterogeneous Lexical Semantic Spaces-Reference-Cited by-同舟云学术

Unseen Word Representation by Aligning Heterogeneous Lexical Semantic Spaces

Published:2019-07-17 Issue: Volume:33 Page:6900-6907
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Prokhorov Victor,Pilehvar Mohammad Taher,Kartsaklis Dimitri,Lio Pietro,Collier Nigel

Abstract

Word embedding techniques heavily rely on the abundance of training data for individual words. Given the Zipfian distribution of words in natural language texts, a large number of words do not usually appear frequently or at all in the training data. In this paper we put forward a technique that exploits the knowledge encoded in lexical resources, such as WordNet, to induce embeddings for unseen words. Our approach adapts graph embedding and cross-lingual vector space transformation techniques in order to merge lexical knowledge encoded in ontologies with that derived from corpus statistics. We show that the approach can provide consistent performance improvements across multiple evaluation benchmarks: in-vitro, on multiple rare word similarity datasets, and invivo, in two downstream text classification tasks.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The Impact of Word Splitting on the Semantic Content of Contextualized Word Representations;Transactions of the Association for Computational Linguistics;2024

2. A Complete Process of Text Classification System Using State-of-the-Art NLP Models;Computational Intelligence and Neuroscience;2022-06-09

3. Hide and Seek: Revisiting DNS-based User Tracking;2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P);2022-06