Affiliation:
1. School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA
2. AMD Research, Advanced Micro Devices Inc.
Abstract
Stacked memory modules are likely to be tightly integrated with the processor. It is vital that these memory modules operate reliably, as memory failure can require the replacement of the entire socket. To make matters worse, stacked memory designs are susceptible to newer failure modes (e.g., due to faulty through-silicon vias, or TSVs) that can cause large portions of memory, such as a bank, to become faulty. To avoid data loss from large-granularity failures, the memory system may use symbol-based codes that stripe the data for a cache line across several banks (or channels). Unfortunately, such data-striping reduces memory-level parallelism, causing significant slowdown and higher power consumption.
This article proposes
Citadel
, a robust memory architecture that allows the memory system to retain each cache line within one bank. By retaining cache lines within banks, Citadel enables a high-performance and low-power memory system and also efficiently protects the stacked memory system from large-granularity failures. Citadel consists of three components;
TSV-Swap
, which can tolerate both faulty data-TSVs and faulty address-TSVs; Tri-Dimensional Parity (3DP), which can tolerate column failures, row failures, and bank failures; and
Dynamic Dual-Granularity Sparing (DDS)
, which can mitigate permanent faults by dynamically sparing faulty memory regions either at a row granularity or at a bank granularity. Our evaluations with real-world data for DRAM failures show that Citadel provides performance and power similar to maintaining the entire cache line in the same bank, and yet provides 700 × higher reliability than ChipKill-like ECC codes.
Funder
SRC STARnet Centers
Center for Future Architectures Research
MARCO and DARPA
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Information Systems,Software
Reference41 articles.
1. BioBench: A Benchmark Suite of Bioinformatics Applications
2. Jay Bolaria. 2011. Micron reinvents DRAM memory. In Microprocessor Report (MPR). Jay Bolaria. 2011. Micron reinvents DRAM memory. In Microprocessor Report (MPR).
3. Hybrid Memory Cube Consortium. 2013. Hybrid Memory Cube Specification 1.0. Retrieved from hybridmemorycube.org. Hybrid Memory Cube Consortium. 2013. Hybrid Memory Cube Specification 1.0. Retrieved from hybridmemorycube.org.
4. Roberts David and Nair Prashant. 2014. FaultSim: A fast configurable memory-resilience simulator. In The Memory Forum: In conjunction with ISCA-41. Roberts David and Nair Prashant. 2014. FaultSim: A fast configurable memory-resilience simulator. In The Memory Forum: In conjunction with ISCA-41.
5. Manek Dubash. 2004. Not hot swap but “fail in place.” In TechWorld. Retrieved from http://features.techworld.com/storage/960/not-hot-swap-but-fail-in-place/. Manek Dubash. 2004. Not hot swap but “fail in place.” In TechWorld. Retrieved from http://features.techworld.com/storage/960/not-hot-swap-but-fail-in-place/.
Cited by
13 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献