Affiliation:
1. Arizona State Univ., Tempe, Arizona
2. Intel Corp., Hillsboro, Oregon
Abstract
The difficulty of designing fault-tolerant distributed algorithms incr eases with the severity of failures that an algorithm must tolerate, especially for systems with synchronous message passing. This paper considers methods that automatically translate algorithms tolerant of simple crash failures into ones tolerant of more severe failures. These translations simplify the design task by allowing algorithm designers to assume that processors fail only by stopping. Such translations can be quantified by two measures:
fault-tolerance
, which is a measure of how many processors must remain correct for the translation to be correct, and
round-complexity
, which is a measure of how the translation increases the running time of an algorithm. Understanding these translations and their limitations with respect to these measures can provide insight into the relative impact of different models of faculty behavior on the ability to provide fault-tolerant applications for systems with synchronous message passing.
This paper considers translations fr om crash failures to each of the following types of more severe failures: omission to send messages; omission to send and receive messages; and totally arbitrary behavior. It shows that previously developed translaions to send-omission failures are optimal with respect to both fault-tolerance and round-complexity. It exhibits a hierarchy of translations to general (send/receive) omission failures that improves upon the fault-tolerance of previously developed translations. These translations are optimal in that they cannot be improved with respect to one measure without negatively affecting the other; that is, the hierarchy of translations is matched by corresponding hierarchy of impossibility results. The paper also gives a hierarchy of translations to arbitrary failures that improves upon the round-complexity of previously developed translations. These translations are near-optimal;
Publisher
Association for Computing Machinery (ACM)
Subject
Artificial Intelligence,Hardware and Architecture,Information Systems,Control and Systems Engineering,Software
Reference16 articles.
1. Bounds on the time to reach agreement in the presence of timing uncertainty
2. Asynchronous Byzantine agreement protocols
3. COAN B. A. 1987. Achieving Consensus in Fault-Tolerant Distributed Computer Systems: Protocols Lower Bounds and Simulations. Ph.D. dissertation. Massachusetts Institute of Technology Cambridge Mass. COAN B. A. 1987. Achieving Consensus in Fault-Tolerant Distributed Computer Systems: Protocols Lower Bounds and Simulations. Ph.D. dissertation. Massachusetts Institute of Technology Cambridge Mass.
Cited by
20 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献