Author:
Wei Shuangjian,Chen Qiurui
Abstract
Abstract
Checkpoint technology can improve the efficiency of simulation backtracking. Existing check pointing tools are dedicated to solving system fault tolerance and load balancing problems, and have limited support for persistent data. The overhead of setting and restoring checkpoints for persistent data is critical for simulation backtracking. We designed LibIFC to deal with the problem of checkpoint overhead for persistent data in two aspects. Firstly, the incremental method is used to reduce the space cost of the checkpoint; secondly, a two-way recovery method is used to reduce the time cost of checkpoint recovery. The experimental results show that the above two strategies have achieved significant results in terms of space overhead and checkpoint recovery time overhead.
Reference22 articles.
1. A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems;Egwutuoha;The Journal of Supercomputing,2013
2. Optimum checkpoints for programs with loops [J];Siavvas,2019
3. Portable Application-level Checkpointing for Hybrid MPI-OpenMP Applications [J];Losada,2016
4. Checkpointing and its applications;Wang,1995
5. Design and implementation of a low-overhead file checkpointing approach;Pei,2000