CLU-Reference-Cited by-同舟云学术

CLU

Published:2021-04 Issue:2 Volume:17 Page:1-25
ISSN:1550-4832
Container-title:ACM Journal on Emerging Technologies in Computing Systems
language:en
Short-container-title:J. Emerg. Technol. Comput. Syst.

Author:

Das Palash¹,Kapoor Hemangee K.¹

Affiliation:

1. Indian Institute of Technology Guwahati, Guwahati, Assam

Abstract

Convolutional/Deep Neural Networks (CNNs/DNNs) are rapidly growing workloads for the emerging AI-based systems. The gap between the processing speed and the memory-access latency in multi-core systems affects the performance and energy efficiency of the CNN/DNN tasks. This article aims to alleviate this gap by providing a simple and yet efficient near-memory accelerator-based system that expedites the CNN inference. Towards this goal, we first design an efficient parallel algorithm to accelerate CNN/DNN tasks. The data is partitioned across the multiple memory channels (vaults) to assist in the execution of the parallel algorithm. Second, we design a hardware unit, namely the convolutional logic unit (CLU), which implements the parallel algorithm. To optimize the inference, the CLU is designed, and it works in three phases for layer-wise processing of data. Last, to harness the benefits of near-memory processing (NMP), we integrate homogeneous CLUs on the logic layer of the 3D memory, specifically the Hybrid Memory Cube (HMC). The combined effect of these results in a high-performing and energy-efficient system for CNNs/DNNs. The proposed system achieves a substantial gain in the performance and energy reduction compared to multi-core CPU- and GPU-based systems with a minimal area overhead of 2.37%.

Publisher

Association for Computing Machinery (ACM)

Subject

Electrical and Electronic Engineering,Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3427472

Reference64 articles.

1. Tensorflow: A system for large-scale machine learning;Abadi Martín;OSDI,2016

2. A scalable processing-in-memory accelerator for parallel graph processing

3. PIM-enabled instructions

4. IMCE: Energy-efficient bit-wise in-memory convolution engine for deep neural network

5. CMP-PIM

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Hardware-Software Co-Design of a Collaborative DNN Accelerator for 3D Stacked Memories with Multi-Channel Data;2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC);2024-01-22

2. PreCog: Near-Storage Accelerator for Heterogeneous CNN Inference;2023 IEEE 34th International Conference on Application-specific Systems, Architectures and Processors (ASAP);2023-07

3. DDAM: D ata D istribution- A ware M apping of CNNs on Processing-In-Memory Systems;ACM Transactions on Design Automation of Electronic Systems;2023-03-19

4. A CNN Hardware Accelerator Using Triangle-based Convolution;ACM Journal on Emerging Technologies in Computing Systems;2022-10-13

5. A Near Memory Computing FPGA Architecture for Neural Network Acceleration;2022 2nd International Conference on Frontiers of Electronics, Information and Computation Technologies (ICFEICT);2022-08