Affiliation:
1. Michigan Technological University
2. Michigan State University
Abstract
The complexity of designing programs that simultaneously tolerate multiple classes of faults, called
multitolerant
programs, is in part due to the conflicting nature of the fault tolerance requirements that must be met by a multitolerant program when different types of faults occur. To facilitate the design of multitolerant programs, we present sound and (deterministically) complete algorithms for stepwise design of two families of multitolerant programs in a high atomicity program model, where a process can read and write all program variables in an atomic step. We illustrate that if one needs to design failsafe (respectively, nonmasking) fault tolerance for one class of faults and masking fault tolerance for another class of faults, then a multitolerant program can be designed in separate polynomial-time (in the state space of the fault-intolerant program) steps regardless of the order of addition. This result has a significant methodological implication in that designers need not be concerned about unknown fault tolerance requirements that may arise due to unanticipated types of faults. Further, we illustrate that if one needs to design failsafe fault tolerance for one class of faults and nonmasking fault tolerance for a different class of faults, then the resulting problem is NP-complete in program state space. This is a counterintuitive result in that designing failsafe and nonmasking fault tolerance for the same class of faults can be done in polynomial time. We also present sufficient conditions for polynomial-time design of
failsafe-nonmasking
multitolerance. Finally, we demonstrate the stepwise design of multitolerance for a stable disk storage system, a token ring network protocol and a repetitive agreement protocol that tolerates Byzantine and transient faults. Our automatic approach decreases the design time from days to a few hours for the token ring program that is our largest example with 200 million reachable states and 8 processes.
Funder
Office of Naval Research
Division of Computer and Network Systems
Defense Advanced Research Projects Agency
Division of Computing and Communication Foundations
National Science Foundation
Air Force Office of Scientific Research
Publisher
Association for Computing Machinery (ACM)
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献