Efficient set intersection for inverted indexing-Reference-Cited by-同舟云学术

Efficient set intersection for inverted indexing

Published:2010-12 Issue:1 Volume:29 Page:1-25
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Culpepper J. Shane¹,Moffat Alistair²

Affiliation:

1. RMIT University and The University of Melbourne, Australia

2. The University of Melbourne, Australia

Abstract

Conjunctive Boolean queries are a key component of modern information retrieval systems, especially when Web-scale repositories are being searched. A conjunctive query q is equivalent to a | q |-way intersection over ordered sets of integers, where each set represents the documents containing one of the terms, and each integer in each set is an ordinal document identifier. As is the case with many computing applications, there is tension between the way in which the data is represented, and the ways in which it is to be manipulated. In particular, the sets representing index data for typical document collections are highly compressible, but are processed using random access techniques, meaning that methods for carrying out set intersections must be alert to issues to do with access patterns and data representation. Our purpose in this article is to explore these trade-offs, by investigating intersection techniques that make use of both uncompressed “integer” representations, as well as compressed arrangements. We also propose a simple hybrid method that provides both compact storage, and also faster intersection computations for conjunctive querying than is possible even with uncompressed representations.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/1877766.1877767

Reference31 articles.

1. A Fast Set Intersection Algorithm for Sorted Sequences

2. Faster Adaptive Set Intersections for Text Searching

3. An almost optimal algorithm for unbounded searching

Cited by 78 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Data Indexing and Filtering Techniques for Big Data Systems;Resource Management on Distributed Systems;2024-09-06

2. Efficient List Intersection Algorithm for Short Documents by Document Reordering;Mathematics;2024-04-26

3. Information Retrieval Systems in Healthcare;Advances in Healthcare Information Systems and Administration;2024-02-09

4. Privacy-aware document retrieval with two-level inverted indexing;Information Retrieval Journal;2023-11-17

5. Efficient immediate-access dynamic indexing;Information Processing & Management;2023-05