1. Reliability issues in computing system design;Randell;ACM Computing Surveys,1978
2. A survey of rollback-recovery protocols in message-passing systems;Elnozahy;ACM Computing Surveys,2002
3. System structure for software fault tolerance;Randell,1975
4. Failure tolerance in petascale computers;Gibson;CTWatch Quarterly,2007
5. C. Engelmann, G.A. Geist, Super-scalable algorithms for computing on 100,000 processors, in: Lecture Notes in Computer Science: Proceedings of the 5th International Conference on Computational Science, ICCS, 2005, Part I, Atlanta, GA, USA, May 22–25, vol. 3514, 2005, pp. 313–320.