Affiliation:
1. Department of Computer Science, University of Alabama, Tuscaloosa, AL 35487-0290, USA
Abstract
Coordinated checkpointing has low stable storage requirements and simplifies the recovery process by reserving a set of consistent global checkpoints. Unfortunately, most algorithms that were proposed either incurred a high communication overhead or blocked all processes. Then, a coordinated algorithm was presented which was nonblocking and which forced only a subset of all processes to participate in a checkpointing event. This algorithm was shown to create inconsistencies in some situations and new algorithms to take consistent checkpoints were proposed. However, we found that these algorithms can still result in inconsistencies when typical behavior in a distributed environment is considered, such as multiple forced checkpoints and multiple concurrent checkpoint initiations. In this paper we identify the inconsistencies that can occur and present an efficient nonblocking algorithm that collects consistent global checkpoints and avoids some of the pitfalls in distributed nonblocking checkpointing.
Publisher
World Scientific Pub Co Pte Lt
Subject
Computer Networks and Communications
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献