Affiliation:
1. Student (M.Tech Scholar, ECE), Chandigarh Engineering College, Mohali, India
2. Professor and Head, Department of ECE, Chandigarh Engineering College, Mohali, India
Abstract
In Distributed computing system, Fault tolerance is an important issue because if the system fails then whole execution of a tasks stop. Fault tolerance is that asset of a system which provides the service to perform well still in case of any faults. A task applied on the real time distributed system must be feasible and reliable. The real time distributed systems for instance grid networks, robotics, air traffic control systems, etc. exceedingly depends on time. A single error in real time distributed system can cause a whole system failure, if not detected accurately and recovered at the proper time. Fault-tolerance is the key method which is often used to provide continue reliability in these systems. By applying extra hardware like processors, resource, communication links hardware fault tolerance can be achieved. A fault perhaps will occur for numerous reasons in distributed computing system such as failure of network, hardware or software failure etc.
This paper defines various terminologies like failure, fault, faulty environment, fault tolerance, candidate node, redundancy etc. and explains fundamental concepts linked to fault tolerance in distributed systems. There are a lot of issues in distributed Computing system such as Emergent resource sharing, transparency, dependability, Complex mappings, concurrency, Fault tolerance etc. In this paper we focussed on the different fault tolerant approaches and fault tolerant terminologies used in distributed computing environment.
Subject
General Earth and Planetary Sciences,General Environmental Science