Affiliation:
1. Department of Information Studies, University of Sheffield, Western Bank, Sheffield S10 2TN, United Kingdom
Abstract
The use of automatic classification techniques has been suggested as a means of increasing the effectiveness of docu ment retrieval systems; however, the automatic generation of a classification requires a large amount of computation, and it is thus of importance to know whether this computation will result in material increases in retrieval performance. This paper describes three methods - the overlap test, the nearest neighbour test and the density test - which can be used to measure the degree of clustering tendency in a set of docu ments. It is shown that the three tests are not in complete agreement with each other in their evaluation of the degree of clustering tendency present in seven document test collections. A comparison of the predicted degree of clustering tendency with the relative effectiveness of cluster and non-cluster searches suggests that the density test gives the most useful results; it also has the advantage that it does not require query and relevance data and can thus be used in a predictive manner when a document collection is to be processed for the first time.
Subject
Library and Information Sciences,Information Systems
Cited by
26 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Cluster-Based Document Retrieval with Multiple Queries;Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval;2020-09-14
2. Testing the Cluster Hypothesis with Focused and Graded Relevance Judgments;The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval;2018-06-27
3. The correlation between cluster hypothesis tests and the effectiveness of cluster-based retrieval;Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval;2014-07-03
4. Query-performance prediction for effective query routing in domain-specific repositories;Journal of the Association for Information Science and Technology;2014-04-11
5. The Cluster Hypothesis in Information Retrieval;Lecture Notes in Computer Science;2014