Affiliation:
1. University of Wisconsin-Madison
Abstract
Recursive query processing has experienced a recent resurgence, as a result of its use in many modern application domains, including data integration, graph analytics, security, program analysis, networking and decision making. Due to the large volumes of data being processed, several research efforts across multiple communities have explored how to scale up recursive queries, typically expressed in Datalog. Our experience with these tools indicate that their performance does not translate across domains---e.g., a tool designed for large-scale graph analytics does not exhibit the same performance on program-analysis tasks, and vice versa.
Starting from the above observation, we make the following two contributions. First, we perform a detailed experimental evaluation comparing a number of state-of-the-art Datalog systems on a wide spectrum of graph analytics and program-analysis tasks, and summarize the pros and cons of existing techniques. Second, we design and implement our own general-purpose Datalog engine, called RecStep, on top of a parallel single-node relational system. We outline the techniques we applied on RecStep, as well as the contribution of each technique to the overall performance. Using RecStep as a baseline, we demonstrate that it generally out-performs state-of-the-art parallel Datalog engines on complex and large-scale Datalog evaluation, by a 4-6X margin. An additional insight from our work is that it is possible to build a high-performance Datalog system on top of a relational engine, an idea that has been dismissed in past work.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
13 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Efficient Enumeration of Recursive Plans in Transformation-Based Query Optimizers;Proceedings of the VLDB Endowment;2024-07
2. Adaptive Recursive Query Optimization;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13
3. Optimizing Nested Recursive Queries;Proceedings of the ACM on Management of Data;2024-03-12
4. Flan: An Expressive and Efficient Datalog Compiler for Program Analysis;Proceedings of the ACM on Programming Languages;2024-01-05
5. Communication-Avoiding Recursive Aggregation;2023 IEEE International Conference on Cluster Computing (CLUSTER);2023-10-31