Distributed Application Checkpointing for Replicated State Machines-Reference-Cited by-同舟云学术

Distributed Application Checkpointing for Replicated State Machines

Published:2021-02-09 Issue:1 Volume:22 Page:67-79
ISSN:1895-1767
Container-title:Scalable Computing: Practice and Experience
language:
Short-container-title:SCPE

Author:

Çelikel Özdinç^ORCID,Ovatman Tolga^ORCID

Abstract

Application checkpointing is a widely used recovery mechanism that consists of saving an application's state periodically to be used in case of a failure. In this study we investigate the utilisation of distributed checkpointing for replicated state machines. Conventionally, for replicated state machines, checkpointing information is stored in a replicated way in each of the replicas or separately in a single instance. Applying distributed checkpointing provides a means to adjust the level of fault tolerance of the checkpointing approach by giving away from recovery time. We use a local cluster and cloud environment to examine the effects of distributed checkpointing in a simple state machine example and compare the results with conventional approaches. As expected, distributed checkpointing gains from memory consumption and utilise different levels of fault tolerance while performing worse in terms of recovery time.

Publisher

Scalable Computing: Practice and Experience

Subject

General Computer Science

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. eHotSnap: An Efficient and Hot Distributed Snapshots System for Virtual Machine Cluster;IEEE Transactions on Parallel and Distributed Systems;2023-08

2. Secure and failure hybrid delay enabled a lightweight RPC and SHDS schemes in Industry 4.0 aware IIoHT enabled fog computing;Mathematical Biosciences and Engineering;2021