Affiliation:
1. Fondazione Ugo Bordoni, Rome, Italy
2. Univ. of Avignon, Avignon, France
Abstract
Techniques for automatic query expansion from top retrieved documents have shown promise for improving retrieval effectiveness on large collections; however, they often rely on an empirical ground, and there is a shortage of cross-system comparisons. Using ideas from Information Theory, we present a computationally simple and theoretically justified method for assigning scores to candidate expansion terms. Such scores are used to select and weight expansion terms within Rocchio's framework for query reweigthing. We compare ranking with information-theoretic query expansion versus ranking with other query expansion techniques, showing that the former achieves better retrieval effectiveness on several performance measures. We also discuss the effect on retrieval effectiveness of the main parameters involved in automatic query expansion, such as data sparseness, query difficulty, number of selected documents, and number of selected terms, pointing out interesting relationships.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Science Applications,General Business, Management and Accounting,Information Systems
Reference58 articles.
1. AMATI G.AND VAN RIJSBERGEN K. 2000. Probabilistic models of information retrieval based on measuring the divergence from randomness.]] AMATI G.AND VAN RIJSBERGEN K. 2000. Probabilistic models of information retrieval based on measuring the divergence from randomness.]]
2. Local Feedback in Full-Text Retrieval Systems
3. Combined models for topic spotting and topic-dependent language modeling
Cited by
210 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献