Affiliation:
1. University of Wisconsin – Madison, WI
2. University of Texas – Austin, TX
Abstract
We introduce
protocol-aware recovery
(P
ar
), a new approach that exploits protocol-specific knowledge to correctly recover from storage faults in distributed systems. We demonstrate the efficacy of P
ar
through the design and implementation of <underline>c</underline>orruption-<underline>t</underline>olerant <underline>r</underline>ep<underline>l</underline>ication (C
trl
), a P
ar
mechanism specific to replicated state machine (RSM) systems. We experimentally show that the C
trl
versions of two systems, LogCabin and ZooKeeper, safely recover from storage faults and provide high availability, while the unmodified versions can lose data or become unavailable. We also show that the C
trl
versions achieve this reliability with little performance overheads.
Funder
U.S. Department of Energy
National Science Foundation
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. The Dirty Secret of SSDs: Embodied Carbon;ACM SIGEnergy Energy Informatics Review;2023-10
2. A Study of Failure Recovery and Logging of High-Performance Parallel File Systems;ACM Transactions on Storage;2022-04-28
3. Hampa: Solver-Aided Recency-Aware Replication;Computer Aided Verification;2020
4. Shipborne Robust Message Queuing Service Prototype System;2019 5th International Conference on Big Data Computing and Communications (BIGCOM);2019-08
5. #DeleteFacebook: Antecedents of Facebook Fatigue;Cyberpsychology, Behavior, and Social Networking;2019-06