Affiliation:
1. Linköping University, Sweden
2. Indian Institute of Science, India
Abstract
While the rapid development in semiconductor technologies makes it possible to manufacture integrated circuits (ICs) with multiple processors, so called Multi-Processor System-on-Chip (MPSoC), ICs manufactured in recent semiconductor technologies are becoming increasingly susceptible to transient faults, which enforces fault tolerance. Work on fault tolerance has mainly focused on safety-critical applications; however, the development of semiconductor technologies makes fault tolerance also needed for general-purpose systems. Different from safety-critical systems where meeting hard deadlines is the main requirement, it is for general-purpose systems more important to minimize the average execution time (AET). The contribution of this chapter is two-fold. First, the authors present a mathematical framework for the analysis of AET. Their analysis of AET is performed for voting, rollback recovery with checkpointing (RRC), and the combination of RRC and voting (CRV) where for a given job and soft (transient) error probability, the authors define mathematical formulas for each of the fault-tolerant techniques with the objective to minimize AET while taking bus communication overhead into account. And, for a given number of processors and jobs, the authors define integer linear programming models that minimize AET including communication overhead. Second, as error probability is not known at design time and it can change during operation, they present two techniques, periodic probability estimation (PPE) and aperiodic probability estimation (APE), to estimate the error probability and adjust the fault tolerant scheme while the IC is in operation.
Reference23 articles.
1. Al-Omari, R., Somani, A., & Manimaran, G. (2001). A new fault-tolerant technique for improving schedulability in multiprocessor real-time systems. International Parallel and Distributed Processing Symposium (IPDPS’01) (pp. 629-648). Washington, DC: IEEE Computer Society.
2. Alstrom, K., & Torin, J. (2001). Future architecture for flight control systems. The 20th Conference on Digital Avionics Systems, (vol. 1, pp. 1B5/1 - 1B5/10).
3. Berkelaar, M. (n.d.). lpsolve 3.0. Eindhoven University of Technology, Eindhoven, The Netherlands. Retrieved from ftp://ftp.ics.ele.tue.nl/pub/lp_solve
4. Bertossi, A., Fusiello, A., & Mancini, L. (1997). Fault-tolerant deadline-monotonic algorithm for scheduling hard-real-time tasks. International Parallel Processing Symposium, (pp. 133-138).
5. Bertossi, A., & Mancini, L. (1994). Scheduling Algorithms for Fault-Tolerance in Hard-Real Time Systems. In Real Time Systems, (pp. 229-256).
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献