Wisdom of crowds versus wisdom of linguists – measuring the semantic relatedness of words-Reference-Cited by-同舟云学术

Wisdom of crowds versus wisdom of linguists – measuring the semantic relatedness of words

Published:2009-09-09 Issue:1 Volume:16 Page:25-59
ISSN:1351-3249
Container-title:Natural Language Engineering
language:en
Short-container-title:Nat. Lang. Eng.

Author:

ZESCH TORSTEN,GUREVYCH IRYNA

Abstract

AbstractIn this article, we present a comprehensive study aimed at computing semantic relatedness of word pairs. We analyze the performance of a large number of semantic relatedness measures proposed in the literature with respect to different experimental conditions, such as (i) the datasets employed, (ii) the language (English or German), (iii) the underlying knowledge source, and (iv) the evaluation task (computing scores of semantic relatedness, ranking word pairs, solving word choice problems). To our knowledge, this study is the first to systematically analyze semantic relatedness on a large number of datasets with different properties, while emphasizing the role of the knowledge source compiled either by the ‘wisdom of linguists’ (i.e., classical wordnets) or by the ‘wisdom of crowds’ (i.e., collaboratively constructed knowledge sources like Wikipedia).The article discusses benefits and drawbacks of different approaches to evaluating semantic relatedness. We show that results should be interpreted carefully to evaluate particular aspects of semantic relatedness. For the first time, we employ a vector based measure of semantic relatedness, relying on a concept space built from documents, to the first paragraph of Wikipedia articles, to English WordNet glosses, and to GermaNet based pseudo glosses. Contrary to previous research (Strube and Ponzetto 2006; Gabrilovich and Markovitch 2007; Zesch et al. 2007), we find that ‘wisdom of crowds’ based resources are not superior to ‘wisdom of linguists’ based resources. We also find that using the first paragraph of a Wikipedia article as opposed to the whole article leads to better precision, but decreases recall. Finally, we present two systems that were developed to aid the experiments presented herein and are freely available1 for research purposes: (i) DEXTRACT, a software to semi-automatically construct corpus-driven semantic relatedness datasets, and (ii) JWPL, a Java-based high-performance Wikipedia Application Programming Interface (API) for building natural language processing (NLP) applications.

Publisher

Cambridge University Press (CUP)

Subject

Artificial Intelligence,Linguistics and Language,Language and Linguistics,Software

Reference54 articles.

1. Development and application of a metric on semantic nets

2. Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Cited by 55 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Evaluating Computational Models of Similarity against a Human Rated Dataset;Baltic Journal of Modern Computing;2022

2. Evaluation of taxonomic and neural embedding methods for calculating semantic similarity;Natural Language Engineering;2021-09-28

3. Developing the Persian Wordnet of Verbs Using Supervised Learning;ACM Transactions on Asian and Low-Resource Language Information Processing;2021-07-31

4. A survey of semantic relatedness evaluation datasets and procedures;Artificial Intelligence Review;2019-12-23

5. Semantic association computation: a comprehensive survey;Artificial Intelligence Review;2019-11-20