Author:
Goulart Henrique,Franco Álvaro,Mendizabal Odorico
Abstract
This paper concisely reviews checkpointing techniques in distributed systems, focusing on various aspects such as coordinated and uncoordinated checkpointing, incremental checkpoints, fuzzy checkpoints, adaptive checkpoint intervals, and kernel-based and user-space checkpoints. The review highlights interesting points, outlines how each checkpoint approach works, and discusses their advantages and drawbacks. It also provides a brief overview of the adoption of checkpoints in different contexts in distributed computing, including Database Management Systems (DBMS), State Machine Replication (SMR), and High-Performance Computing (HPC) environments. Additionally, the paper briefly explores the application of checkpointing strategies in modern cloud and container environments, discussing their role in live migration and application state management. The review offers valuable insights into their adoption and application across various distributed computing contexts by summarizing the historical development, advances, and challenges in checkpointing techniques.
Publisher
Sociedade Brasileira de Computação - SBC
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Reducing Persistence Overhead in Parallel State Machine Replication through Time-Phased Partitioned Checkpoint;Journal of Internet Services and Applications;2024-07-26
2. The State of Container Checkpointing with CRIU: A Multi-Case Experience Report;2024 IEEE 21st International Conference on Software Architecture Companion (ICSA-C);2024-06-04
3. Achieving Enhanced Performance Combining Checkpointing and Dynamic State Partitioning;2023 IEEE 35th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD);2023-10-17
4. A Time-Phased Partitioned Checkpoint Approach to Reduce State Snapshot Overhead;12th Latin-American Symposium on Dependable and Secure Computing;2023-10-16