H3DM: A High-bandwidth High-capacity Hybrid 3D Memory Design for GPUs

Author:

Akbarzadeh Negar1ORCID,Darabi Sina2ORCID,Gheibi-Fetrat Atiyeh1ORCID,Mirzaei Amir1ORCID,Sadrosadati Mohammad3ORCID,Sarbazi-Azad Hamid4

Affiliation:

1. Sharif University of Technology, Tehran, Iran

2. Institute for Research in Fundamental Sciences (IPM), Università della Svizzera italiana (USI), Tehran, Iran

3. Institute for Research in Fundamental Sciences (IPM), Tehran, Iran

4. Sharif University of Technology, Institute for Research in Fundamental Sciences (IPM), , Iran

Abstract

Graphics Processing Units (GPUs) are widely used for modern applications with huge data sizes. However, the performance benefit of GPUs is limited by their memory capacity and bandwidth. Although GPU vendors improve memory capacity and bandwidth using 3D memory technology (HBM), many important workloads with terabytes of data still cannot fit in the provided capacity and are bound by the provided bandwidth. With a limited GPU memory capacity, programmers should handle the data movement between GPU and host memories by themselves, causing a significant programming burden. To improve programming ease, GPUs use a unified address space with the host that allows over-subscribing GPU memory, but this approach is not effective in terms of performance once GPUs encounter memory page faults. Many recent works have tried to remedy capacity and bandwidth bottlenecks using dense non-volatile memories (NVMs) and true-3D stacking. However, these works mainly focus on one bottleneck or do not provide a scalable solution that fits future requirements. In this paper, we investigate true-3D stacking of dense, low-power, and refresh-free non-volatile phase change memory (PCM) on top of state-of-the-art GPU configurations to provide higher capacity and bandwidth within the available area and power budget. The higher density and lower power consumption of PCM provide higher capacity through integrating more cells in each 3D layer and enabling stacking more layers. However, we observe that stacking more than six layers of pure-PCM memory violates the thermal constraint and severely harms the performance and power efficiency due to its higher write latency and energy. Further, it degrades the lifetime of GPU to less than one year. Utilizing a hybrid architecture that leverages the benefits of both DRAM and PCM memories has been widely studied by prior proposals; however, true-3D integration of such a hybrid memory architecture especially on top of state-of-the-art powerful GPU architecture has not been investigated yet. We experimentally demonstrate that by covering 80% of write requests in DRAM and eliminating refresh overhead, true-3D stacking of eight 32GB layers of PCM along with two 8GB layers of DRAM is possible resulting in a total of 272GB memory capacity. Based on the explored design requirements, We propose a 3D high-bandwidth high-capacity hybrid memory (H3DM) system utilizing a hybrid-3D (H3D)-aware remapping scheme to reduce expensive PCM writes to under 20% while avoiding DRAM refresh overhead. H3DM improves the performance up to 291% compared to the baseline GPU architecture while remaining within only 3% of an ideal case with DRAM-like access latency, on average. Moreover, by increasing the dataset size above the baseline GPU memory space, H3DM improves performance and power up to 648% and 87% compared to the baseline GPU architecture since it avoids expensive data transfers through off-chip communication links.

Publisher

Association for Computing Machinery (ACM)

Reference117 articles.

1. Unlocking bandwidth for GPUs in CC-NUMA systems

2. Takuya Akiba, Tommi Kerola, Yusuke Niitani, Toru Ogawa, Shotaro Sano, and Shuji Suzuki. 2018. Pfdet: 2nd place solution to open images challenge 2018 object detection track. arXiv preprint arXiv:1809.00778 (2018).

3. Data reorganization in memory using 3D-stacked DRAM

4. SchedTune: A Heterogeneity-Aware GPU Scheduler for Deep Learning

5. A morphable phase change memory architecture considering frequent zero values

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3