On Fault Tolerance, Locality, and Optimality in Locally Repairable Codes-Reference-Cited by-同舟云学术

On Fault Tolerance, Locality, and Optimality in Locally Repairable Codes

Published:2020-06-13 Issue:2 Volume:16 Page:1-32
ISSN:1553-3077
Container-title:ACM Transactions on Storage
language:en
Short-container-title:ACM Trans. Storage

Author:

Kolosov Oleg¹,Yadgar Gala²,Liram Matan²,Tamo Itzhak¹,Barg Alexander³

Affiliation:

1. Tel Aviv University, Israel

2. Technion, Israel

3. University of Maryland, Moscow, Russia

Abstract

Erasure codes in large-scale storage systems allow recovery of data from a failed node. A recently developed class of codes, locally repairable codes (LRCs), offers tradeoffs between storage overhead and repair cost. LRCs facilitate efficient recovery scenarios by adding parity blocks to the system. However, these additional blocks may eventually increase the number of blocks that must be reconstructed. Existing LRCs differ in their use of the parity blocks, in their locality semantics, and in their parameter space. Thus, existing theoretical models cannot directly compare different LRCs to determine which code offers the best recovery performance, and at what cost. We perform the first systematic comparison of existing LRC approaches. We analyze Xorbas, Azure’s LRCs, and Optimal-LRCs in light of two new metrics: average degraded read cost and normalized repair cost. We show the tradeoff between these costs and the code’s fault tolerance, and that different approaches offer different choices in this tradeoff. Our experimental evaluation on a Ceph cluster further demonstrates the different effects of realistic system bottlenecks on the benefit from each LRC approach. Despite these differences, the normalized repair cost metric can reliably identify the LRC approach that would achieve the lowest repair cost in each setup.

Funder

ISF

NSF

NSF-BSF

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/3381832

Reference59 articles.

Cited by 15 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Parallelized In-Network Aggregation for Failure Repair in Erasure-Coded Storage Systems;IEEE/ACM Transactions on Networking;2024-08

2. Optimal Wide Stripe Generation in Locally Repairable Codes via Staged Stripe Merging;2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS);2024-07-23

3. Design of flexible Compression Transmission Framework for Distributed Storage System Based on Erasure Coding;2023 5th International Conference on Frontiers Technology of Information and Computer (ICFTIC);2023-11-17

4. Practical Design Considerations for Wide Locally Recoverable Codes (LRCs);ACM Transactions on Storage;2023-11-14

5. MDTUpdate: A Multi-Block Double Tree Update Technique in Heterogeneous Erasure-Coded Clusters;IEEE Transactions on Computers;2023-10