Affiliation:
1. University of Wisconsin - Madison, Madison, WI, USA
Abstract
Recovery functionality has many applications in computing systems, from speculation recovery in modern microprocessors to fault recovery in high-reliability systems. Modern systems commonly recover using checkpoints. However, checkpoints introduce overheads, add complexity, and often save more state than necessary.
This paper develops a novel compiler technique to recover program state without the overheads of explicit checkpoints. The technique breaks programs into
idempotent regions
---regions that can be freely re-executed---which allows recovery without checkpointed state. Leveraging the property of idempotence, recovery can be obtained by simple re-execution. We develop static analysis techniques to construct these regions and demonstrate low overheads and large region sizes for an LLVM-based implementation. Across a set of diverse benchmark suites, we construct idempotent regions close in size to those that could be obtained with perfect runtime information. Although the resulting code runs more slowly, typical performance overheads are in the range of just 2-12%.
The paradigm of executing entire programs as a series of idempotent regions we call
idempotent processing
, and it has many applications in computer systems. As a concrete example, we demonstrate it applied to the problem of compiler-automated hardware fault recovery. In comparison to two other state-of-the-art techniques, redundant execution and checkpoint-logging, our idempotent processing technique outperforms both by over 15%.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Cited by
34 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Ensuring consistent recovery under power failure with minimal NVM write overhead;Journal of Systems Architecture;2024-03
2. VERLIB: Concurrent Versioned Pointers;Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming;2024-02-20
3. RTailor: Parameterizing Soft Error Resilience for Mixed-Criticality Real-Time Systems;2023 IEEE Real-Time Systems Symposium (RTSS);2023-12-05
4. A Type System for Safe Intermittent Computing;Proceedings of the ACM on Programming Languages;2023-06-06
5. Featherweight Soft Error Resilience for GPUs;2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO);2022-10