Affiliation:
1. The University of Texas at Austin, Austin, TX, USA
Abstract
We present a general scheme for virtualizing main memory error-correction mechanisms, which map redundant information needed to correct errors into the memory namespace itself. We rely on this basic idea, which increases flexibility to increase error protection capabilities, improve power efficiency, and reduce system cost; with only small performance overheads. We augment the virtual memory system architecture to detach the physical mapping of data from the physical mapping of its associated ECC information. We then use this mechanism to develop two-tiered error protection techniques that separate the process of detecting errors from the rare need to also correct errors, and thus save energy. We describe how to provide strong chipkill and double-chip kill protection using existing DRAM and packaging technology. We show how to maintain access granularity and redundancy overheads, even when using ×8 DRAM chips. We also evaluate error correction for systems that do not use ECC DIMMs. Overall, analysis of demanding SPEC CPU 2006 and PARSEC benchmarks indicates that performance overhead is only 1% with ECC DIMMs and less than 10% using standard Non-ECC DIMM configurations, that DRAM power savings can be as high as 27%, and that the system energy-delay product is improved by 12% on average.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Hybrid Hardware/Software Detection of Multi-Bit Upsets in Memory;2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W);2024-06-24
2. Application-Based Fault Tolerance Techniques for Fully Protecting Sparse Matrix Solvers;2017 IEEE International Conference on Cluster Computing (CLUSTER);2017-09
3. Balancing the Lifetime and Storage Overhead on Error Correction for Phase Change Memory;PLOS ONE;2015-07-09
4. COMeT+: Continuous Online Memory Testing with Multi-Threading Extension;IEEE Transactions on Computers;2014-07
5. SPMCloud;ACM Transactions on Design Automation of Electronic Systems;2014-06