Rethinking DRAM design and organization for energy-constrained multi-cores-Reference-Cited by-同舟云学术

Rethinking DRAM design and organization for energy-constrained multi-cores

Published:2010-06-19 Issue:3 Volume:38 Page:175-186
ISSN:0163-5964
Container-title:ACM SIGARCH Computer Architecture News
language:en
Short-container-title:SIGARCH Comput. Archit. News

Author:

Udipi Aniruddha N.¹,Muralimanohar Naveen²,Chatterjee Niladrish¹,Balasubramonian Rajeev¹,Davis Al¹,Jouppi Norman P.²

Affiliation:

1. University of Utah, Salt Lake City, UT, USA

2. Hewlett-Packard Laboratories, Palo Alto, CA, USA

Abstract

DRAM vendors have traditionally optimized the cost-per-bit metric, often making design decisions that incur energy penalties. A prime example is the overfetch feature in DRAM, where a single request activates thousands of bit-lines in many DRAM chips, only to return a single cache line to the CPU. The focus on cost-per-bit is questionable in modern-day servers where operating costs can easily exceed the purchase cost. Modern technology trends are also placing very different demands on the memory system: (i)queuing delays are a significant component of memory access time, (ii) there is a high energy premium for the level of reliability expected for business-critical computing, and (iii) the memory access stream emerging from multi-core systems exhibits limited locality. All of these trends necessitate an overhaul of DRAM architecture, even if it means a slight compromise in the cost-per-bit metric. This paper examines three primary innovations. The first is a modification to DRAM chip microarchitecture that re tains the traditional DDRx SDRAMinterface. Selective Bit-line Activation (SBA) waits for both RAS (row address) and CAS (column address) signals to arrive before activating exactly those bitlines that provide the requested cache line. SBA reduces energy consumption while incurring slight area and performance penalties. The second innovation, Single Subarray Access (SSA), fundamentally re-organizes the layout of DRAM arrays and the mapping of data to these arrays so that an entire cache line is fetched from a single subarray. It requires a different interface to the memory controller, reduces dynamic and background energy (by about 6X), incurs a slight area penalty (4%), and can even lead to performance improvements (54% on average) by reducing queuing delays. The third innovation further penalizes the cost-per-bit metric by adding a checksum feature to each cache line. This checksum error-detection feature can then be used to build stronger RAID-like fault tolerance, including chipkill-level reliability. Such a technique is especially crucial for the SSA architecture where the entire cache line is localized to a single chip. This DRAM chip microarchitectural change leads to a dramatic reduction in the energy and storage overheads for reliability. The proposed architectures will also apply to other emerging memory technologies (such as resistive memories) and will be less disruptive to standards, interfaces, and the design flow if they can be incorporated into first-generation designs.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/1816038.1815983

Reference56 articles.

1. CACTI : An Integrated Cache and Memory Access Time Cycle Time Area Leakage and Dynamic Power Model. http://www.hpl.hp.com/research/cacti/. CACTI: An Integrated Cache and Memory Access Time Cycle Time Area Leakage and Dynamic Power Model. http://www.hpl.hp.com/research/cacti/.

2. HP Advanced Memory Protection Technologies - Technology Brief. http://www.hp.com. HP Advanced Memory Protection Technologies - Technology Brief. http://www.hp.com.

3. Micron System Power Calculator. http://www.micron.com/support/part info/powercalc. Micron System Power Calculator. http://www.micron.com/support/part info/powercalc.

4. STREAM - Sustainable Memory Bandwidth in High Performance Computers. http://www.cs.virginia.edu/stream/. STREAM - Sustainable Memory Bandwidth in High Performance Computers. http://www.cs.virginia.edu/stream/.

5. Virtutech Simics Full System Simulator. http://www.virtutech.com. Virtutech Simics Full System Simulator. http://www.virtutech.com.

Cited by 91 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands;2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA);2024-06-29

2. FASA-DRAM: Reducing DRAM Latency with Destructive Activation and Delayed Restoration;ACM Transactions on Architecture and Code Optimization;2024-05-21

3. MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Computing;2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2024-03-02

4. CoolDRAM: An Energy-Efficient and Robust DRAM;2023 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED);2023-08-07

5. Accelerating Personalized Recommendation with Cross-level Near-Memory Processing;Proceedings of the 50th Annual International Symposium on Computer Architecture;2023-06-17