Compaction management in distributed key-value datastores-Reference-Cited by-同舟云学术

Compaction management in distributed key-value datastores

Published:2015-04 Issue:8 Volume:8 Page:850-861
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Ahmad Muhammad Yousuf¹,Kemme Bettina¹

Affiliation:

1. McGill University

Abstract

Compactions are a vital maintenance mechanism used by datastores based on the log-structured merge-tree to counter the continuous buildup of data files under update-intensive workloads. While compactions help keep read latencies in check over the long run, this comes at the cost of significantly degraded read performance over the course of the compaction itself. In this paper, we offer an in-depth analysis of compaction-related performance overheads and propose techniques for their mitigation. We offload large, expensive compactions to a dedicated compaction server to allow the datastore server to better utilize its resources towards serving the actual workload. Moreover, since the newly compacted data is already cached in the compaction server's main memory, we fetch this data over the network directly into the datastore server's local cache, thereby avoiding the performance penalty of reading it back from the filesystem. In fact, pre-fetching the compacted data from the remote cache prior to switching the workload over to it can eliminate local cache misses altogether. Therefore, we implement a smarter warmup algorithm that ensures that all incoming read requests are served from the datastore server's local cache even as it is warming up. We have integrated our solution into HBase, and using the YCSB and TPC-C benchmarks, we show that our approach significantly mitigates compaction-related performance problems. We also demonstrate the scalability of our solution by distributing compactions across multiple compaction servers.

Publisher

VLDB Endowment

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/2757807.2757810

Cited by 40 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. D ² Comp: Efficient Offload of LSM-tree Compaction with Data Processing Units on Disaggregated Storage;ACM Transactions on Architecture and Code Optimization;2024-09-14

2. Structural Designs Meet Optimality: Exploring Optimized LSM-tree Structures in a Colossal Configuration Space;Proceedings of the ACM on Management of Data;2024-05-29

3. CaaS-LSM: Compaction-as-a-Service for LSM-based Key-Value Stores in Storage Disaggregated Infrastructure;Proceedings of the ACM on Management of Data;2024-05-29

4. Increase Merge Efficiency in LSM Trees Through Coordinated Partitioning of Sorted Runs;2023 IEEE International Conference on Big Data (BigData);2023-12-15

5. Learning to Optimize LSM-trees: Towards A Reinforcement Learning based Key-Value Store for Dynamic Workloads;Proceedings of the ACM on Management of Data;2023-11-13