Understanding Latency Variation in Modern DRAM Chips

Author:

Chang Kevin K.1,Kashyap Abhijith1,Hassan Hasan2,Ghose Saugata1,Hsieh Kevin3,Lee Donghyuk1,Li Tianshi4,Pekhimenko Gennady1,Khan Samira5,Mutlu Onur6

Affiliation:

1. Carnegie Mellon University, Pittsburgh, PA, USA

2. Carnegie Mellon University & TOBB ETU, Ankara, Turkey

3. Carnegie Mellon University, Pittsburgh, USA

4. Peking University & Carnegie Mellon University, Pittsburgh, PA, USA

5. University of Virginia, Charlottesville, VA, USA

6. ETH Zurich & Carnegie Mellon University, Pittsburgh, PA, USA

Abstract

Long DRAM latency is a critical performance bottleneck in current systems. DRAM access latency is defined by three fundamental operations that take place within the DRAM cell array: (i) activation of a memory row, which opens the row to perform accesses; (ii) precharge, which prepares the cell array for the next memory access; and (iii) restoration of the row, which restores the values of cells in the row that were destroyed due to activation. There is significant latency variation for each of these operations across the cells of a single DRAM chip due to irregularity in the manufacturing process. As a result, some cells are inherently faster to access, while others are inherently slower. Unfortunately, existing systems do not exploit this variation. The goal of this work is to (i) experimentally characterize and understand the latency variation across cells within a DRAM chip for these three fundamental DRAM operations, and (ii) develop new mechanisms that exploit our understanding of the latency variation to reliably improve performance. To this end, we comprehensively characterize 240 DRAM chips from three major vendors, and make several new observations about latency variation within DRAM. We find that (i) there is large latency variation across the cells for each of the three operations; (ii) variation characteristics exhibit significant spatial locality: slower cells are clustered in certain regions of a DRAM chip; and (iii) the three fundamental operations exhibit different reliability characteristics when the latency of each operation is reduced. Based on our observations, we propose Flexible-LatencY DRAM (FLY-DRAM), a mechanism that exploits latency variation across DRAM cells within a DRAM chip to improve system performance. The key idea of FLY-DRAM is to exploit the spatial locality of slower cells within DRAM, and access the faster DRAM regions with reduced latencies for the fundamental operations. Our evaluations show that FLY-DRAM improves the performance of a wide range of applications by 13.3%, 17.6%, and 19.5%, on average, for each of the three different vendors' real DRAM chips, in a simulated 8-core system. We conclude that the experimental characterization and analysis of latency variation within modern DRAM, provided by this work, can lead to new techniques that improve DRAM and system performance.

Funder

National Science Foundation

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture,Software

Reference83 articles.

1. N. Agarwal phet al. "Page Placement Strategies for GPUs Within Heterogeneous Memory Systems " in ASPLOS 2015. 10.1145/2694344.2694381 N. Agarwal phet al. "Page Placement Strategies for GPUs Within Heterogeneous Memory Systems " in ASPLOS 2015. 10.1145/2694344.2694381

2. H. Bauer etal "Memory: Are Challenges ahead?" March 2016. Available: http://www.mckinsey.com/industries/semiconductors/our-insights/memory-are-challenges-ahead H. Bauer et al. "Memory: Are Challenges ahead?" March 2016. Available: http://www.mckinsey.com/industries/semiconductors/our-insights/memory-are-challenges-ahead

3. Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors

4. B. H. Bloom "Space/Time Tradeoffs in Hash Coding with Allowable Errors " CACM July 1970. 10.1145/362686.362692 B. H. Bloom "Space/Time Tradeoffs in Hash Coding with Allowable Errors " CACM July 1970. 10.1145/362686.362692

5. K. Chakraborty and P. Mazumder Fault-Tolerance and Reliability Techniques for High-Density Random-Access Memories.\hskip 1em plus 0.5em minus 0.4em\relax Prentice Hall 2002. K. Chakraborty and P. Mazumder Fault-Tolerance and Reliability Techniques for High-Density Random-Access Memories.\hskip 1em plus 0.5em minus 0.4em\relax Prentice Hall 2002.

Cited by 18 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Full-Stack Revision of Memory and Data Management in PDES on Multi-Core Machines;Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing;2024-06-03

2. Spatial Variation-Aware Read Disturbance Defenses: Experimental Analysis of Real DRAM Chips and Implications on Future Solutions;2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2024-03-02

3. Functionally-Complete Boolean Logic in Real DRAM Chips: Experimental Characterization and Analysis;2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2024-03-02

4. iNUMAlloc: Towards Intelligent Memory Allocation for AI Accelerators with NUMA;2023 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom);2023-12-21

5. Mitigation of Rowhammer Attack on DDR4 Memory: A Novel Multi-Table Frequent Element Algorithm Based Approach;2023 IEEE 66th International Midwest Symposium on Circuits and Systems (MWSCAS);2023-08-06

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3