Abstract
This paper studies the optimization of list intersection, especially in the context of the matching phase of search engines. Given a user query, we intersect the postings lists corresponding to the query keywords to generate the list of documents matching all keywords. Since the speed of list intersection depends the algorithm, hardware, and list lengths and their correlations, none the existing intersection algorithms outperforms the others in every scenario. Therefore, we develop a cost-based approach in which we identify a search space, spanning existing algorithms and their combinations. We propose a cost model to estimate the cost of the algorithms with their combinations, and use the cost model to search for the lowest-cost algorithm. The resulting plan is usually a combination of 2-way algorithms, outperforming conventional 2-way and
k
-way algorithms. The proposed approach is more general than designing a specific algorithm, as the cost models can be adapted to different hardware. We validate the cost model experimentally on two different CPUs, and show that the cost model closely estimates the actual cost. Using both real and synthetic datasets, we show that the proposed cost-based optimizer outperforms the state-of-the-art alternatives.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. In-depth Analysis of Continuous Subgraph Matching in a Common Delta Query Compilation Framework;Proceedings of the ACM on Management of Data;2024-05-29
2. Efficient immediate-access dynamic indexing;Information Processing & Management;2023-05
3. An Index for Set Intersection With Post-Filtering;IEEE Transactions on Knowledge and Data Engineering;2023
4. Efficient Regular Expression Matching Based on Positional Inverted Index;IEEE Transactions on Knowledge and Data Engineering;2022-03-01
5. Llama;Proceedings of the ACM Symposium on Cloud Computing;2021-11