Affiliation:
1. Sapienza University of Rome
Abstract
Web search result clustering aims to facilitate information search on the Web. Rather than the results of a query being presented as a flat list, they are grouped on the basis of their similarity and subsequently shown to the user as a list of clusters. Each cluster is intended to represent a different meaning of the input query, thus taking into account the lexical ambiguity (i.e., polysemy) issue. Existing Web clustering methods typically rely on some shallow notion of textual similarity between search result snippets, however. As a result, text snippets with no word in common tend to be clustered separately even if they share the same meaning, whereas snippets with words in common may be grouped together even if they refer to different meanings of the input query.In this article we present a novel approach to Web search result clustering based on the automatic discovery of word senses from raw text, a task referred to as Word Sense Induction. Key to our approach is to first acquire the various senses (i.e., meanings) of an ambiguous query and then cluster the search results based on their semantic similarity to the word senses induced. Our experiments, conducted on data sets of ambiguous queries, show that our approach outperforms both Web clustering and search engines.
Subject
Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics
Cited by
79 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献