Affiliation:
1. Knowledge Media Laboratory, Toshiba Corporate R&D Center, Kawasaki, JAPAN
Abstract
Although Pseudo-Relevance Feedback (PRF) is a widely used technique for enhancing average retrieval performance, it may actually hurt performance for around one-third of a given set of topics. To enhance the reliability of PRF, Flexible PRF has been proposed, which adjusts the number of pseudo-relevant documents and/or the number of expansion terms for each topic. This paper explores a new, inexpensive Flexible PRF method, called Selective Sampling, which is unique in that it can skip documents in the initial ranked output to look for more “novel” pseudo-relevant documents. While Selective Sampling is only comparable to Traditional PRF in terms of average performance and reliability, per-topic analyses show that Selective Sampling outperforms Traditional PRF almost as often as Traditional PRF outperforms Selective Sampling. Thus, treating the top
P
documents as relevant is often not the best strategy. However, predicting
when
Selective Sampling outperforms Traditional PRF appears to be as difficult as predicting when a PRF method fails. For example, our per-topic analyses show that even the proportion of truly relevant documents in the pseudo-relevant set is not necessarily a good performance predictor.
Publisher
Association for Computing Machinery (ACM)
Reference34 articles.
1. Amati G. Carpineto C. and Romano G. 2004. Fondazione ugo bordoni at TREC 2003: robust and web track. In NIST Special Publication 500-255: The 12th Text Retrieval Conference (TREC 2003). Amati G. Carpineto C. and Romano G. 2004. Fondazione ugo bordoni at TREC 2003: robust and web track. In NIST Special Publication 500-255: The 12th Text Retrieval Conference (TREC 2003).
2. Buckley C. and Harman D. 2004. Reliable information access final workshop report. Buckley C. and Harman D. 2004. Reliable information access final workshop report.
Cited by
43 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献