Affiliation:
1. University of A Coruña, Spain
Abstract
Information retrieval (IR) systems typically compress their indexes in order to increase their efficiency. Static pruning is a form of lossy data compression: it removes from the index, data that is estimated to be the least important to retrieval performance, according to some criterion. Generally, pruning criteria are derived from term weighting functions, which assign weights to terms according to their contribution to a document's contents. Usually, document-term occurrences that are assigned a low weight are ruled out from the index. The main assumption is that those entries contribute little to the document content.
We present a novel pruning technique that is based on a probabilistic model of IR. We employ the Probability Ranking Principle as a decision criterion over which posting list entries are to be pruned. The proposed approach requires the estimation of three probabilities, combining them in such a way that we gather all the necessary information to apply the aforementioned criterion.
We evaluate our proposed pruning technique on five TREC collections and various retrieval tasks, and show that in almost every situation it outperforms the state of the art in index pruning. The main contribution of this work is proposing a pruning technique that stems directly from the same source as probabilistic retrieval models, and hence is independent of the final model used for retrieval.
Funder
Ministerio de Ciencia e Innovación
FEDER
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Science Applications,General Business, Management and Accounting,Information Systems
Cited by
18 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Diversity-aware strategies for static index pruning;Information Processing & Management;2024-09
2. Neural Passage Quality Estimation for Static Pruning;Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval;2024-07-10
3. An Analysis on Matching Mechanisms and Token Pruning for Late-interaction Models;ACM Transactions on Information Systems;2024-04-29
4. Static Pruning for Multi-Representation Dense Retrieval;Proceedings of the ACM Symposium on Document Engineering 2023;2023-08-22
5. A Static Pruning Study on Sparse Neural Retrievers;Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval;2023-07-18