Affiliation:
1. Bilkent University, Ankara, Turkey
Abstract
We propose a unique cluster-based retrieval (CBR) strategy using a new cluster-skipping inverted file for improving query processing efficiency. The new inverted file incorporates cluster membership and centroid information along with the usual document information into a single structure. In our incremental-CBR strategy, during query evaluation, both best(-matching) clusters and the best(-matching) documents of such clusters are computed together with a single posting-list access per query term. As we switch from term to term, the best clusters are recomputed and can dynamically change. During query-document matching, only relevant portions of the posting lists corresponding to the best clusters are considered and the rest are skipped. The proposed approach is essentially tailored for environments where inverted files are compressed, and provides substantial efficiency improvement while yielding comparable, or sometimes better, effectiveness figures. Our experiments with various collections show that the incremental-CBR strategy using a compressed cluster-skipping inverted file significantly improves CPU time efficiency, regardless of query length. The new compressed inverted file imposes an acceptable storage overhead in comparison to a typical inverted file. We also show that our approach scales well with the collection size.
Funder
Türkiye Bilimsel ve Teknolojik Arastirma Kurumu
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Science Applications,General Business, Management and Accounting,Information Systems
Cited by
16 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Sampling Individually-Fair Rankings that are Always Group Fair;Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society;2023-08-08
2. Profiling and Visualizing Dynamic Pruning Algorithms;Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval;2023-07-18
3. Exploiting Cluster-Skipping Inverted Index for Semantic Place Retrieval;Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval;2023-07-18
4. Anytime Ranking on Document-Ordered Indexes;ACM Transactions on Information Systems;2022-01-31
5. Relevance- and interface-driven clustering for visual information retrieval;Information Systems;2020-12