3D-ReG-Reference-Cited by-同舟云学术

3D-ReG

Published:2020-04-30 Issue:2 Volume:16 Page:1-24
ISSN:1550-4832
Container-title:ACM Journal on Emerging Technologies in Computing Systems
language:en
Short-container-title:J. Emerg. Technol. Comput. Syst.

Author:

Li Bing¹^ORCID,Doppa Janardhan Rao²,Pande Partha Pratim²,Chakrabarty Krishnendu³,Qiu Joe X.⁴,Li Hai (Helen)³

Affiliation:

1. Capital Normal University, Beijing, China

2. Washington State University, Pullman, WA

3. Duke University, Durham, NC

4. Duke University and Army Research Office, Research Triangle Park, Durham, NC

Abstract

Deep neural network (DNN) models are being expanded to a broader range of applications. The computational capability of traditional hardware platforms cannot accommodate the growth of model complexity. Among recent technologies to accelerate DNN, resistive memory (ReRAM)-based processing-in-memory (PIM) emerged as a promising solution for DNN inference due to its high efficiency for matrix-based computation. We face two major technical challenges in extending the use of ReRAM-based accelerators for training: (1) full-precision data is essential in back-propagation; (2) the need to support both feed-forward and back-propagation aggravates the data-movement burden. We propose a heterogeneous architecture named as 3D-ReG, which leverages full-precision GPU to ensure training accuracy and low-overhead 3D integration to provide low-cost data movements. Moreover, we introduce conservative and aggressive task-mapping schemes, which partition the computation phases in different ways to balance execution efficiency and training accuracy. We evaluate 3D-ReG implemented with two 3D integration technologies, through-silicon vias (TSVs) and monolithic inter-tier vias (MIVs), and compare them with GPU-only and PIM-only counterparts. Various GPU-only platforms using two main-memory technologies (DRAM, ReRAM) and three interconnect technologies (2D, TSV, MIV) are evaluated as well. Experimental results show that 3D-ReG can achieve on average 5.64× training speedup and 3.56× higher energy efficiency compared with the GPU with DRAM as main memory, at the cost of 0.05%–3.39% accuracy drop. We define a new metric, gain-loss ratio (GLR), which quantitatively evaluates the capability of a DNN training hardware in terms of the model accuracy and hardware efficiency. The results of our comparison show that the aggressive task-mapping scheme on MIV-based 3D-ReG outperforms the other methods.

Funder

US Department of Energy

NRC Associate Fellowship Award

Publisher

Association for Computing Machinery (ACM)

Subject

Electrical and Electronic Engineering,Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3375699

Reference79 articles.

1. Tor M. Aamodt Wilson W. L. Fung Inderpreet Singh Ahmed El-Shafiey Jimmy Kwa Tayler Hetherington Ayub Gubran Andrew Boktor Tim Rogers Ali Bakhoda etal 2012. GPGPU-Sim Manual. Retrieved from http://gpgpu-sim.org/manual/. Tor M. Aamodt Wilson W. L. Fung Inderpreet Singh Ahmed El-Shafiey Jimmy Kwa Tayler Hetherington Ayub Gubran Andrew Boktor Tim Rogers Ali Bakhoda et al. 2012. GPGPU-Sim Manual. Retrieved from http://gpgpu-sim.org/manual/.

2. 3D GPU architecture using cache stacking: Performance, cost, power and thermal analysis

3. AMD. 2017. Radeon’s next-generation Vega architecture. Retrieved from https://radeon.com/_downloads/vega-whitepaper-11.6.17.pdf. AMD. 2017. Radeon’s next-generation Vega architecture. Retrieved from https://radeon.com/_downloads/vega-whitepaper-11.6.17.pdf.

Cited by 18 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. TEFLON: Thermally Efficient Dataflow-aware 3D NoC for Accelerating CNN Inferencing on Manycore PIM Architectures;ACM Transactions on Embedded Computing Systems;2024-08-14

2. On Continuing DNN Accelerator Architecture Scaling Using Tightly Coupled Compute-on-Memory 3-D ICs;IEEE Transactions on Very Large Scale Integration (VLSI) Systems;2023-10

3. Florets for Chiplets: Data Flow-aware High-Performance and Energy-efficient Network-on-Interposer for CNN Inference Tasks;ACM Transactions on Embedded Computing Systems;2023-09-09

4. Evaluating Machine LearningWorkloads on Memory-Centric Computing Systems;2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS);2023-04

5. Scalable and Energy-Efficient NN Acceleration with GPU-ReRAM Architecture;Applied Reconfigurable Computing. Architectures, Tools, and Applications;2023