Rethinking DRAM design and organization for energy-constrained multi-cores

Author:

Udipi Aniruddha N.1,Muralimanohar Naveen2,Chatterjee Niladrish1,Balasubramonian Rajeev1,Davis Al1,Jouppi Norman P.2

Affiliation:

1. University of Utah, Salt Lake City, UT, USA

2. Hewlett-Packard Laboratories, Palo Alto, CA, USA

Abstract

DRAM vendors have traditionally optimized the cost-per-bit metric, often making design decisions that incur energy penalties. A prime example is the overfetch feature in DRAM, where a single request activates thousands of bit-lines in many DRAM chips, only to return a single cache line to the CPU. The focus on cost-per-bit is questionable in modern-day servers where operating costs can easily exceed the purchase cost. Modern technology trends are also placing very different demands on the memory system: (i)queuing delays are a significant component of memory access time, (ii) there is a high energy premium for the level of reliability expected for business-critical computing, and (iii) the memory access stream emerging from multi-core systems exhibits limited locality. All of these trends necessitate an overhaul of DRAM architecture, even if it means a slight compromise in the cost-per-bit metric. This paper examines three primary innovations. The first is a modification to DRAM chip microarchitecture that re tains the traditional DDRx SDRAMinterface. Selective Bit-line Activation (SBA) waits for both RAS (row address) and CAS (column address) signals to arrive before activating exactly those bitlines that provide the requested cache line. SBA reduces energy consumption while incurring slight area and performance penalties. The second innovation, Single Subarray Access (SSA), fundamentally re-organizes the layout of DRAM arrays and the mapping of data to these arrays so that an entire cache line is fetched from a single subarray. It requires a different interface to the memory controller, reduces dynamic and background energy (by about 6X), incurs a slight area penalty (4%), and can even lead to performance improvements (54% on average) by reducing queuing delays. The third innovation further penalizes the cost-per-bit metric by adding a checksum feature to each cache line. This checksum error-detection feature can then be used to build stronger RAID-like fault tolerance, including chipkill-level reliability. Such a technique is especially crucial for the SSA architecture where the entire cache line is localized to a single chip. This DRAM chip microarchitectural change leads to a dramatic reduction in the energy and storage overheads for reliability. The proposed architectures will also apply to other emerging memory technologies (such as resistive memories) and will be less disruptive to standards, interfaces, and the design flow if they can be incorporated into first-generation designs.

Publisher

Association for Computing Machinery (ACM)

Reference56 articles.

1. CACTI : An Integrated Cache and Memory Access Time Cycle Time Area Leakage and Dynamic Power Model. http://www.hpl.hp.com/research/cacti/. CACTI: An Integrated Cache and Memory Access Time Cycle Time Area Leakage and Dynamic Power Model. http://www.hpl.hp.com/research/cacti/.

2. HP Advanced Memory Protection Technologies - Technology Brief. http://www.hp.com. HP Advanced Memory Protection Technologies - Technology Brief. http://www.hp.com.

3. Micron System Power Calculator. http://www.micron.com/support/part info/powercalc. Micron System Power Calculator. http://www.micron.com/support/part info/powercalc.

4. STREAM - Sustainable Memory Bandwidth in High Performance Computers. http://www.cs.virginia.edu/stream/. STREAM - Sustainable Memory Bandwidth in High Performance Computers. http://www.cs.virginia.edu/stream/.

5. Virtutech Simics Full System Simulator. http://www.virtutech.com. Virtutech Simics Full System Simulator. http://www.virtutech.com.

Cited by 91 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands;2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA);2024-06-29

2. FASA-DRAM: Reducing DRAM Latency with Destructive Activation and Delayed Restoration;ACM Transactions on Architecture and Code Optimization;2024-05-21

3. MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Computing;2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2024-03-02

4. CoolDRAM: An Energy-Efficient and Robust DRAM;2023 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED);2023-08-07

5. Accelerating Personalized Recommendation with Cross-level Near-Memory Processing;Proceedings of the 50th Annual International Symposium on Computer Architecture;2023-06-17

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3