Abstract
A significant fraction of data in cloud storage is rarely accessed, referred to as
cold data.
Accurately identifying and efficiently managing cold data on cost-effective storages is one of the major challenges for cloud providers, which balances between reducing the cost and improving the system performance. To this end, we propose
SA-LSM
to use (S)urvival (A)nalysis for Log-Structure Merge Tree (LSM-tree) key-value (KV) stores. Conventionally, the data layout of LSM-tree is determined jointly by the write and the compaction operations. However, this process by default does not fully utilize the access information of data records, leading to a suboptimal data layout that negatively impacts the system performance.
SA-LSM
utilizes the survival analysis, a statistical learning algorithm commonly used in biostatistics, to optimize the data layout.
When put into perspective of LSM-tree with proper adoptions,
SA-LSM
can accurately predict cold data using the historical semantic information and access traces. As a concrete realization, we implement our proposal in X-Engine, a commercial-strength open-source LSM-tree storage engine. To make the deployment more flexible, we also design a non-intrusive architecture that offloads CPU-intensive work, e.g., model training and inference, to an external service. Extensive experiments on real-world workloads show that it can decrease the tail latency by up to 78.9% compared to the state-of-the-art techniques. The generality of this approach and the significant performance improvement show great potentials in a variety of related applications.
Publisher
Association for Computing Machinery (ACM)
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Reference60 articles.
1. 2022. Alibaba Cloud Disk overview. https://www.alibabacloud.com/help/doc-detail/25383.htm. 2022. Alibaba Cloud Disk overview. https://www.alibabacloud.com/help/doc-detail/25383.htm.
2. 2022. Aliyun Intelligent stress testing. https://www.alibabacloud.com/help/zh/database-autonomy-service/latest/global-features-intelligent-stress-testing. 2022. Aliyun Intelligent stress testing. https://www.alibabacloud.com/help/zh/database-autonomy-service/latest/global-features-intelligent-stress-testing.
3. 2022. Confusion Matrix. https://en.wikipedia.org/wiki/Confusion_matrix. 2022. Confusion Matrix. https://en.wikipedia.org/wiki/Confusion_matrix.
4. 2022. fio. https://github.com/axboe/fio. 2022. fio. https://github.com/axboe/fio.
5. 2022. How Much Data Do We Create Every Day? http://bit.ly/2uIeA8Y. 2022. How Much Data Do We Create Every Day? http://bit.ly/2uIeA8Y.
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献