Abstract
This paper describes how to design a software-based fault tolerant application using microprocessor (MP), in order to tolerate the burst errors in memory. This approach may be called a single -- version scheme (SVS). The SVS relies on a single version application program which is enhanced with self-checking code redundancy to tolerate memory burst errors that are difficult to correct during the run-time of an application. Conventionally, the other software based approaches can detect a few bit errors (in memory) only towards fail-stop kind of fault tolerance against transient bit errors. Reed Solomon codes are mainly effective for burst errors in coding of audio Compact Disks at offline only. The proposed online technique does not need multiple versions of software and multiple machines. This approach employs only two copies of the application software running on one machine only. Two copies of the enhanced version version of an application are used here for online error detection and tolerance thereof as well. This is an effective low-cost online tool for hardening a microprocessor-based industrial computing system or for on-chip DRAM applications using an affordable code and time redundancy against the burst errors in processor memory. The SVS aims to provide a non-fail-stop kind of fault tolerance against burst errors. This approach supplements the Error Correcting Codes (ECC) in memory system also, against both the transient and permanent bit errors in memory.
Publisher
Association for Computing Machinery (ACM)
Reference19 articles.
1. IBM S/390 Parallel Enterprise Server G5 fault tolerance: A historical perspective
2. Error Control Systems for Digital Communication and Storage;Wicker Stephen B.;Prentice Hall, NJ, USA,1995
3. Design - Fault Tolerance," in The Evolution of Fault-Tolerant Computing, A. Avizienis, H. Kopertz, and J.-C. Laprie, eds., Springer-Verlag;Randell B.;Vienna,1987
4. The N-Version Approach to Fault-Tolerant Software