3D-ReG

Author:

Li Bing1ORCID,Doppa Janardhan Rao2,Pande Partha Pratim2,Chakrabarty Krishnendu3,Qiu Joe X.4,Li Hai (Helen)3

Affiliation:

1. Capital Normal University, Beijing, China

2. Washington State University, Pullman, WA

3. Duke University, Durham, NC

4. Duke University and Army Research Office, Research Triangle Park, Durham, NC

Abstract

Deep neural network (DNN) models are being expanded to a broader range of applications. The computational capability of traditional hardware platforms cannot accommodate the growth of model complexity. Among recent technologies to accelerate DNN, resistive memory (ReRAM)-based processing-in-memory (PIM) emerged as a promising solution for DNN inference due to its high efficiency for matrix-based computation. We face two major technical challenges in extending the use of ReRAM-based accelerators for training: (1) full-precision data is essential in back-propagation; (2) the need to support both feed-forward and back-propagation aggravates the data-movement burden. We propose a heterogeneous architecture named as 3D-ReG, which leverages full-precision GPU to ensure training accuracy and low-overhead 3D integration to provide low-cost data movements. Moreover, we introduce conservative and aggressive task-mapping schemes, which partition the computation phases in different ways to balance execution efficiency and training accuracy. We evaluate 3D-ReG implemented with two 3D integration technologies, through-silicon vias (TSVs) and monolithic inter-tier vias (MIVs), and compare them with GPU-only and PIM-only counterparts. Various GPU-only platforms using two main-memory technologies (DRAM, ReRAM) and three interconnect technologies (2D, TSV, MIV) are evaluated as well. Experimental results show that 3D-ReG can achieve on average 5.64× training speedup and 3.56× higher energy efficiency compared with the GPU with DRAM as main memory, at the cost of 0.05%–3.39% accuracy drop. We define a new metric, gain-loss ratio (GLR), which quantitatively evaluates the capability of a DNN training hardware in terms of the model accuracy and hardware efficiency. The results of our comparison show that the aggressive task-mapping scheme on MIV-based 3D-ReG outperforms the other methods.

Funder

US Department of Energy

NRC Associate Fellowship Award

Publisher

Association for Computing Machinery (ACM)

Subject

Electrical and Electronic Engineering,Hardware and Architecture,Software

Reference79 articles.

1. Tor M. Aamodt Wilson W. L. Fung Inderpreet Singh Ahmed El-Shafiey Jimmy Kwa Tayler Hetherington Ayub Gubran Andrew Boktor Tim Rogers Ali Bakhoda etal 2012. GPGPU-Sim Manual. Retrieved from http://gpgpu-sim.org/manual/. Tor M. Aamodt Wilson W. L. Fung Inderpreet Singh Ahmed El-Shafiey Jimmy Kwa Tayler Hetherington Ayub Gubran Andrew Boktor Tim Rogers Ali Bakhoda et al. 2012. GPGPU-Sim Manual. Retrieved from http://gpgpu-sim.org/manual/.

2. 3D GPU architecture using cache stacking: Performance, cost, power and thermal analysis

3. AMD. 2017. Radeon’s next-generation Vega architecture. Retrieved from https://radeon.com/_downloads/vega-whitepaper-11.6.17.pdf. AMD. 2017. Radeon’s next-generation Vega architecture. Retrieved from https://radeon.com/_downloads/vega-whitepaper-11.6.17.pdf.

Cited by 18 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. TEFLON: Thermally Efficient Dataflow-aware 3D NoC for Accelerating CNN Inferencing on Manycore PIM Architectures;ACM Transactions on Embedded Computing Systems;2024-08-14

2. On Continuing DNN Accelerator Architecture Scaling Using Tightly Coupled Compute-on-Memory 3-D ICs;IEEE Transactions on Very Large Scale Integration (VLSI) Systems;2023-10

3. Florets for Chiplets: Data Flow-aware High-Performance and Energy-efficient Network-on-Interposer for CNN Inference Tasks;ACM Transactions on Embedded Computing Systems;2023-09-09

4. Evaluating Machine LearningWorkloads on Memory-Centric Computing Systems;2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS);2023-04

5. Scalable and Energy-Efficient NN Acceleration with GPU-ReRAM Architecture;Applied Reconfigurable Computing. Architectures, Tools, and Applications;2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3