Affiliation:
1. Columbia University, New York, NY
Abstract
Data partitioning is a critical operation for manipulating large datasets because it subdivides tasks into pieces that are more amenable to efficient processing. It is often the limiting factor in database performance and represents a significant fraction of the overall runtime of large data queries. This article measures the performance and energy of state-of-the-art software partitioners, and describes and evaluates a hardware range partitioner that further improves efficiency.
The software implementation is broken into two phases, allowing separate analysis of the partition function computation and data shuffling costs. Although range partitioning is commonly thought to be more expensive than simpler strategies such as hash partitioning, our measurements indicate that careful data movement and optimization of the partition function can allow it to approach the throughput and energy consumption of hash or radix partitioning.
For further acceleration, we describe a hardware range partitioner, or HARP, a streaming framework that offers a seamless execution environment for this and other streaming accelerators, and a detailed analysis of a 32nm physical design that matches the throughput of four to eight software threads while consuming just 6.9% of the area and 4.3% of the power of a Xeon core in the same technology generation.
Funder
Oracle
Division of Information and Intelligent Systems
Division of Computing and Communication Foundations
Publisher
Association for Computing Machinery (ACM)
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Compositional Dataflow Circuits;ACM Transactions on Embedded Computing Systems;2019-01-31
2. Compositional dataflow circuits;Proceedings of the 15th ACM-IEEE International Conference on Formal Methods and Models for System Design;2017-09-29
3. Adaptive metadata rebalance in exascale file system;The Journal of Supercomputing;2016-07-09
4. Integrating frequent pattern clustering and branch-and-bound approaches for data partitioning;Information Sciences;2016-01