Affiliation:
1. Huazhong University of Science and Technology, Wuhan, China
2. Chinese University of Hong Kong, Hong Kong, China
Abstract
Repair performance in hierarchical data centers is often bottlenecked by cross-rack network transfer. Recent theoretical results show that the cross-rack repair traffic can be minimized through repair layering, whose idea is to partition a repair operation into inner-rack and cross-rack layers. However, how repair layering should be implemented and deployed in practice remains an open issue. In this article, we address this issue by proposing a practical repair layering framework called
DoubleR
. We design two families of practical double regenerating codes (DRC), which not only minimize the cross-rack repair traffic but also have several practical properties that improve state-of-the-art regenerating codes. We implement and deploy DoubleR atop the Hadoop Distributed File System (HDFS) and show that DoubleR maintains the theoretical guarantees of DRC and improves the repair performance of regenerating codes in both node recovery and degraded read operations.
Funder
National Natural Science Foundation of China
Hubei Provincial Natural Science Foundation of China
Research Grants Council of Hong Kong
Key Laboratory of Information Storage System Ministry of Education of China
Fundamental Research Funds for the Central Universities
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture
Reference58 articles.
1. Marcos K. Aguilera. 2013. Geo-distributed storage in data centers. Slides presented at the International Conference on Principles of Distributed Systems (OPODIS’13). Marcos K. Aguilera. 2013. Geo-distributed storage in data centers. Slides presented at the International Conference on Principles of Distributed Systems (OPODIS’13).
Cited by
49 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Achieving Tunable Erasure Coding with Cluster-Aware Redundancy Transitioning;ACM Transactions on Architecture and Code Optimization;2024-09-14
2. Designing Non-uniform Locally Repairable Codes for Wide Stripes under Skewed File Accesses;Proceedings of the 53rd International Conference on Parallel Processing;2024-08-12
3. HGR: A Hybrid Global Graph-Based Recovery Approach for Cloud Storage Systems with Failure and Straggler Nodes;2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS);2024-07-23
4. Optimal Wide Stripe Generation in Locally Repairable Codes via Staged Stripe Merging;2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS);2024-07-23
5. Toward Optimal Repair and Load Balance in Locally Repairable Codes;Proceedings of the 52nd International Conference on Parallel Processing;2023-08-07