Reasoning on data partitioning for single-round multi-join evaluation in massively parallel systems-Reference-Cited by-同舟云学术

Reasoning on data partitioning for single-round multi-join evaluation in massively parallel systems

Published:2017-02-21 Issue:3 Volume:60 Page:93-100
ISSN:0001-0782
Container-title:Communications of the ACM
language:en
Short-container-title:Commun. ACM

Author:

Ameloot Tom J.¹,Geck Gaetano²,Ketsman Bas¹,Neven Frank¹,Schwentick Thomas²

Affiliation:

1. Hasselt University and Transnational University of Limburg, Hasselt, Belgium

2. TU Dortmund University, Dortmund, Germany

Abstract

Evaluating queries over massive amounts of data is a major challenge in the big data era. Modern massively parallel systems, such as, Spark, organize query answering as a sequence of rounds each consisting of a distinct communication phase followed by a computation phase. The communication phase redistributes data over the available servers, while in the subsequent computation phase each server performs the actual computation on its local data. There is a growing interest in single-round algorithms for evaluating multiway joins where data is first reshuffled over the servers and then evaluated in a parallel but communication-free way. As the amount of communication induced by a reshuffling of the data is a dominating cost in such systems, we introduce a framework for reasoning about data partitioning to detect when we can avoid the data reshuffling step. Specifically, we formalize the decision problems parallel-correctness and transfer of parallel-correctness, provide semantical characterizations, and obtain tight complexity bounds.

Funder

Research Foundation Flanders

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3041063

Reference23 articles.

1. Optimizing Multiway Joins in a Map-Reduce Environment

2. Ameloot T.J. Geck G. Ketsman B. Neven F. Schwentick T. Parallel-correctness and transferability for conjunctive queries submitted for journal publication (2015). Ameloot T.J. Geck G. Ketsman B. Neven F. Schwentick T. Parallel-correctness and transferability for conjunctive queries submitted for journal publication (2015).

3. Communication steps for parallel query processing

4. Skew in parallel query processing

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Combined application of ideological and political education and big data Internet technology in the context of education reform;Applied Mathematics and Nonlinear Sciences;2023-06-03

2. Split-Correctness in Information Extraction;Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems - PODS '19;2019

3. Bank Big Data Architecture Based on Massive Parallel Processing Database;2018 15th International Symposium on Pervasive Systems, Algorithms and Networks (I-SPAN);2018-10

4. Parallel-Correctness and Transferability for Conjunctive Queries;Journal of the ACM;2017-10-15