Affiliation:
1. University of Colorado, Boulder, CO
Abstract
Volcano is a new dataflow query processing system we have developed for database systems research and education. The uniform interface between operators makes Volcano extensible by new operators. All operators are designed and coded as if they were meant for a single-process system only. When attempting to parallelize Volcano, we had to choose between two models of parallelization, called here the
bracket
and
operator
models. We describe the reasons for not choosing the bracket model, introduce the novel operator model, and provide details of Volcano's
exchange
operator that parallelizes all other operators. It allows intra-operator parallelism on partitioned datasets and both vertical and horizontal inter-operator parallelism. The exchange operator encapsulates all parallelism issues and therefore makes implementation of parallel database algorithms significantly easier and more robust. Included in this encapsulation is the translation between demand-driven dataflow within processes and data-driven dataflow between processes. Since the interface between Volcano operators is similar to the one used in “real,” commercial systems, the techniques described here can be used to parallelize other query processing engines.
Publisher
Association for Computing Machinery (ACM)
Subject
Information Systems,Software
Cited by
110 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine;Companion of the 2024 International Conference on Management of Data;2024-06-09
2. Declarative Sub-Operators for Universal Data Processing;Proceedings of the VLDB Endowment;2023-07
3. LAQy: Efficient and Reusable Query Approximations via Lazy Sampling;Proceedings of the ACM on Management of Data;2023-06-13
4. Sampling-Based AQP in Modern Analytical Engines;Data Management on New Hardware;2022-06-12
5. Cost-efficiency and Performance Robustness in Serverless Data Exchange;Proceedings of the 2022 International Conference on Management of Data;2022-06-10