A topological embedding of the lexicon for semantic distance computation
-
Published:2010-06-15
Issue:3
Volume:16
Page:245-275
-
ISSN:1351-3249
-
Container-title:Natural Language Engineering
-
language:en
-
Short-container-title:Nat. Lang. Eng.
Author:
DAVIS N.,GIRAUD-CARRIER C.,JENSEN D.
Abstract
AbstractWe show how a quantitative context may be established for what is essentially qualitative in nature by topologically embedding a lexicon (here, WordNet) in a complete metric space. This novel transformation establishes a natural connection between the order relation in the lexicon (e.g., hyponymy) and the notion of distance in the metric space, giving rise to effective word-level and document-level lexical semantic distance measures. We provide a formal account of the topological transformation and demonstrate the value of our metrics on several experiments involving information retrieval and document clustering tasks.
Publisher
Cambridge University Press (CUP)
Subject
Artificial Intelligence,Linguistics and Language,Language and Linguistics,Software
Reference82 articles.
1. Latent Dirichlet allocation;Blei;Journal of Machine Learning Research,2003
2. Recent trends in hierarchic document clustering: A critical review