Feasibility of Stepwise Design of Multitolerant Programs

Author:

Ebnenasir Ali1,Kulkarni Sandeep S.2

Affiliation:

1. Michigan Technological University

2. Michigan State University

Abstract

The complexity of designing programs that simultaneously tolerate multiple classes of faults, called multitolerant programs, is in part due to the conflicting nature of the fault tolerance requirements that must be met by a multitolerant program when different types of faults occur. To facilitate the design of multitolerant programs, we present sound and (deterministically) complete algorithms for stepwise design of two families of multitolerant programs in a high atomicity program model, where a process can read and write all program variables in an atomic step. We illustrate that if one needs to design failsafe (respectively, nonmasking) fault tolerance for one class of faults and masking fault tolerance for another class of faults, then a multitolerant program can be designed in separate polynomial-time (in the state space of the fault-intolerant program) steps regardless of the order of addition. This result has a significant methodological implication in that designers need not be concerned about unknown fault tolerance requirements that may arise due to unanticipated types of faults. Further, we illustrate that if one needs to design failsafe fault tolerance for one class of faults and nonmasking fault tolerance for a different class of faults, then the resulting problem is NP-complete in program state space. This is a counterintuitive result in that designing failsafe and nonmasking fault tolerance for the same class of faults can be done in polynomial time. We also present sufficient conditions for polynomial-time design of failsafe-nonmasking multitolerance. Finally, we demonstrate the stepwise design of multitolerance for a stable disk storage system, a token ring network protocol and a repetitive agreement protocol that tolerates Byzantine and transient faults. Our automatic approach decreases the design time from days to a few hours for the token ring program that is our largest example with 200 million reachable states and 8 processes.

Funder

Office of Naval Research

Division of Computer and Network Systems

Defense Advanced Research Projects Agency

Division of Computing and Communication Foundations

National Science Foundation

Air Force Office of Scientific Research

Publisher

Association for Computing Machinery (ACM)

Subject

Software

Cited by 8 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. A theory of integrating tamper evidence with stabilization;Science of Computer Programming;2018-08

2. Automation of fault-tolerant graceful degradation;Distributed Computing;2017-12-16

3. Bounded Auditable Restoration of Distributed Systems;IEEE Transactions on Computers;2016

4. Auditable Restoration of Distributed Programs;2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS);2015-09

5. A Theory of Integrating Tamper Evidence with Stabilization;Fundamentals of Software Engineering;2015

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3