AdaptDB-Reference-Cited by-同舟云学术

AdaptDB

Published:2017-01 Issue:5 Volume:10 Page:589-600
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Lu Yi¹,Shanbhag Anil¹,Jindal Alekh²,Madden Samuel¹

Affiliation:

1. MIT CSAIL

2. Microsoft

Abstract

Big data analytics often involves complex join queries over two or more tables. Such join processing is expensive in a distributed setting both because large amounts of data must be read from disk, and because of data shuffling across the network. Many techniques based on data partitioning have been proposed to reduce the amount of data that must be accessed, often focusing on finding the best partitioning scheme for a particular workload, rather than adapting to changes in the workload over time. In this paper, we present AdaptDB, an adaptive storage manager for analytical database workloads in a distributed setting. It works by partitioning datasets across a cluster and incrementally refining data partitioning as queries are run. AdaptDB introduces a novel hyper-join that avoids expensive data shuffling by identifying storage blocks of the joining tables that overlap on the join attribute, and only joining those blocks. Hyper-join performs well when each block in one table overlaps with few blocks in the other table, since that will minimize the number of blocks that have to be accessed. To minimize the number of overlapping blocks for common join queries, AdaptDB users smooth repartitioning to repartition small portions of the tables on join attributes as queries run. A prototype of AdaptDB running on top of Spark improves query performance by 2--3x on TPC-H as well as real-world dataset, versus a system that employs scans and shuffle-joins.

Publisher

VLDB Endowment

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/3055540.3055551

Cited by 36 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhancing Storage Efficiency and Performance: A Survey of Data Partitioning Techniques;Journal of Computer Science and Technology;2024-03

2. Dynamic trajectory partition optimization method based on historical trajectory data;Applied Soft Computing;2024-01

3. AVPS: Automatic Vertical Partitioning for Dynamic Workload;Lecture Notes in Computer Science;2024

4. Automatic Database Knob Tuning: A Survey;IEEE Transactions on Knowledge and Data Engineering;2023-12-01

5. Plexus;Proceedings of the 2023 ACM Symposium on Cloud Computing;2023-10-30