Automating distributed tiered storage management in cluster computing-Reference-Cited by-同舟云学术

Automating distributed tiered storage management in cluster computing

Published:2019-09-15 Issue:1 Volume:13 Page:43-56
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Herodotou Herodotos¹,Kakoulli Elena¹

Affiliation:

1. Cyprus University of Technology, Limassol, Cyprus

Abstract

Data-intensive platforms such as Hadoop and Spark are routinely used to process massive amounts of data residing on distributed file systems like HDFS. Increasing memory sizes and new hardware technologies (e.g., NVRAM, SSDs) have recently led to the introduction of storage tiering in such settings. However, users are now burdened with the additional complexity of managing the multiple storage tiers and the data residing on them while trying to optimize their workloads. In this paper, we develop a general framework for automatically moving data across the available storage tiers in distributed file systems. Moreover, we employ machine learning for tracking and predicting file access patterns, which we use to decide when and which data to move up or down the storage tiers for increasing system performance. Our approach uses incremental learning to dynamically refine the models with new file accesses, allowing them to naturally adjust and adapt to workload changes over time. Our extensive evaluation using realistic workloads derived from Facebook and CMU traces compares our approach with several other policies and showcases significant benefits in terms of both workload performance and cluster efficiency.

Publisher

VLDB Endowment

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/3357377.3357381

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Cost-based Data Prefetching and Scheduling in Big Data Platforms over Tiered Storage Systems;ACM Transactions on Database Systems;2023-11-13

2. Streaming Machine Learning for Supporting Data Prefetching in Modern Data Storage Systems;Proceedings of the First Workshop on AI for Systems;2023-08-10

3. Profiling Hyperscale Big Data Processing;Proceedings of the 50th Annual International Symposium on Computer Architecture;2023-06-17

4. FEC: Efficient Deep Recommendation Model Training with Flexible Embedding Communication;Proceedings of the ACM on Management of Data;2023-06-13

5. Adaptive Intelligent Tiering for modern storage systems;Performance Evaluation;2023-05