Abstract
It is an important research issue to ensure the computation correctness for parallel application and enhance the using rate of dynamic computing resource in distributed computing system. Based on the previous high performance distributing computing system, a fault-tolerant and task scheduler was developed, which combined the breathe mechanism, fault-discover mechanism and subtask reschedule mechanism. Experiments show that the fault-tolerant and task-scheduler has good performance and ensures the computation correctness even if when some computing resources fail.
Publisher
Trans Tech Publications, Ltd.