Affiliation:
1. Universidad de Chile, Santiago, Chile
2. University of Waterloo, Waterloo, ON, Canada
Abstract
The intersection of large ordered sets is a common problem in the context of the evaluation of boolean queries to a search engine. In this article, we propose several improved algorithms for computing the intersection of sorted arrays, and in particular for searching sorted arrays in the intersection context. We perform an experimental comparison with the algorithms from the previous studies from Demaine, López-Ortiz, and Munro [ALENEX 2001] and from Baeza-Yates and Salinger [SPIRE 2005]; in addition, we implement and test the intersection algorithm from Barbay and Kenyon [SODA 2002] and its randomized variant [SAGA 2003]. We consider both the random data set from Baeza-Yates and Salinger, the Google queries used by Demaine et al., a corpus provided by Google, and a larger corpus from the TREC Terabyte 2006 efficiency query stream, along with its own query log. We measure the performance both in terms of the number of comparisons and searches performed, and in terms of the CPU time on two different architectures. Our results confirm or improve the results from both previous studies in their respective context (comparison model on real data, and CPU measures on random data) and extend them to new contexts. In particular, we show that value-based search algorithms perform well in posting lists in terms of the number of comparisons performed.
Publisher
Association for Computing Machinery (ACM)
Subject
Theoretical Computer Science
Cited by
27 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Looplets: A Language for Structured Coiteration;Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization;2023-02-17
2. Autoscheduling for sparse tensor algebra with an asymptotic cost model;Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation;2022-06-09
3. Accelerating All-Edge Common Neighbor Counting on Three Processors;Proceedings of the 48th International Conference on Parallel Processing;2019-08-05
4. Document reordering for faster intersection;Proceedings of the VLDB Endowment;2019-01
5. SIMD-Based Multiple Sets Intersection with Dual-Scale Search Algorithm;Proceedings of the 2017 ACM on Conference on Information and Knowledge Management;2017-11-06