Affiliation:
1. LIFO, Université d'Orléans, BP 6759, 45067 Orléans cedex 2, France
Abstract
Most standard parallel join algorithms try to overcome data skews with a relatively static approach. The way they distribute data (and then computation) over nodes depends on a data re-distribution algorithm (hashing or range partitioning) that is determined before the actual join begins. On the contrary, our approach consists in pre-scanning data in order to choose an efficient join method for each given value of the join attribute. This approach has already proved to be efficient both theoretically and practically in our previous papers. In this paper we introduce a new pipelined version of our frequency adaptive join algorithm. The use of pipelining offers flexible strategies for resource allocation while avoiding unnecessary disk input/output of intermediate join results when computing multi-join queries. We present a detailed version of the algorithm and a cost analysis based on the BSP (Bulk Synchronous Parallel) model, showing that our pipelined algorithm achieves noticeable improvements compared to the sequential parallel version for multi-join queries while guaranteeing perfect balancing properties.
Publisher
World Scientific Pub Co Pte Lt
Subject
Hardware and Architecture,Theoretical Computer Science,Software
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献