Optimal recovery of single disk failure in RDP code storage systems-Reference-Cited by-同舟云学术

Optimal recovery of single disk failure in RDP code storage systems

Published:2010-06-12 Issue:1 Volume:38 Page:119-130
ISSN:0163-5999
Container-title:ACM SIGMETRICS Performance Evaluation Review
language:en
Short-container-title:SIGMETRICS Perform. Eval. Rev.

Author:

Xiang Liping¹,Xu Yinlong¹,Lui John C.S.²,Chang Qian¹

Affiliation:

1. University of Science and Technology of China, Hefei, China

2. The Chinese University of Hong Kong, Hong Kong, Hong Kong

Abstract

Modern storage systems use thousands of inexpensive disks to meet the storage requirement of applications. To enhance the data availability, some form of redundancy is used. For example, conventional RAID-5 systems provide data availability for single disk failure only, while recent advanced coding techniques such as row-diagonal parity (RDP) can provide data availability with up to two disk failures. To reduce the probability of data unavailability, whenever a single disk fails, disk recovery (or rebuild) will be carried out. We show that conventional recovery scheme of RDP code for a single disk failure is inefficient and suboptimal. In this paper, we propose an optimal and efficient disk recovery scheme, Row-Diagonal Optimal Recovery (RDOR), for single disk failure of RDP code that has the following properties: (1) it is read optimal in the sense that it issues the smallest number of disk reads to recover the failed disk; (2) it has the load balancing property that all surviving disks will be subjected to the same amount of additional workload in rebuilding the failed disk. We carefully explore the design state space and theoretically show the optimality of RDOR. We carry out performance evaluation to quantify the merits of RDOR on some widely used disks.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/1811099.1811054

Reference34 articles.

1. A fresh look at the reliability of long-term digital storage

2. EVENODD: an efficient scheme for tolerating double disk failures in RAID architectures

3. J. Bucy J. Schindler S. Schlosser and G. Ganger. The DiskSim simulation environment (v4.0). http://www.pdl.cmu.edu/DiskSim/. J. Bucy J. Schindler S. Schlosser and G. Ganger. The DiskSim simulation environment (v4.0). http://www.pdl.cmu.edu/DiskSim/.

4. The Google file system

Cited by 69 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Parallel Partial Merge Repair Algorithm for Multi-block Failures for Erasure Storage Systems;2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS);2024-05-27

2. Enabling Efficient Erasure Coding in Disaggregated Memory Systems;IEEE Transactions on Parallel and Distributed Systems;2024-01

3. Tunable Sparing of Disks in a Cloud Data Center;2023 7th International Conference on Computer Applications in Electrical Engineering-Recent Advances (CERA);2023-10-27

4. Elastic RAID: Implementing RAID over SSDs with Built-in Transparent Compression;Proceedings of the 16th ACM International Conference on Systems and Storage;2023-06-05

5. Erasure Codes for Cold Data in Distributed Storage Systems;Applied Sciences;2023-02-08