Affiliation:
1. Banaras Hindu University, India
2. Atal Bihari Vajpayee Indian Institute of Information Technology and Management Gwalior, India
Abstract
Present failure detection algorithms for distributed systems are designed to work in asynchronous or partially synchronous environments on mesh (all-to-all) connected systems and maintain status of every other process. Several real-time systems are hierarchically connected and require working in strict synchronous environments. Use of existing failure detectors for such systems would generate excess computation and communication overhead. The chapter describes two suspicion-based failure detectors of Strong S and Perfect P classes for hierarchical distributed systems working in time synchronous environments. The algorithm of Strong S class is capable of detecting permanent crash failures, omission failures, link failures, and timing failures. Strong completeness and weak accuracy properties of the algorithm are evaluated. The failure detector of Perfect P class is capable of detecting crash failures, crash-recovery failures, omission failures, link failures, and timing failures. Strong completeness and strong accuracy properties of the failure detector are evaluated.