Affiliation:
1. Indian Institute of Technology Guwahati, Guwahati, Assam
Abstract
Convolutional/Deep Neural Networks (CNNs/DNNs) are rapidly growing workloads for the emerging AI-based systems. The gap between the processing speed and the memory-access latency in multi-core systems affects the performance and energy efficiency of the CNN/DNN tasks. This article aims to alleviate this gap by providing a simple and yet efficient near-memory accelerator-based system that expedites the CNN inference. Towards this goal, we first design an efficient parallel algorithm to accelerate CNN/DNN tasks. The data is partitioned across the multiple memory channels (vaults) to assist in the execution of the parallel algorithm. Second, we design a hardware unit, namely the convolutional logic unit (CLU), which implements the parallel algorithm. To optimize the inference, the CLU is designed, and it works in three phases for layer-wise processing of data. Last, to harness the benefits of near-memory processing (NMP), we integrate homogeneous CLUs on the logic layer of the 3D memory, specifically the Hybrid Memory Cube (HMC). The combined effect of these results in a high-performing and energy-efficient system for CNNs/DNNs. The proposed system achieves a substantial gain in the performance and energy reduction compared to multi-core CPU- and GPU-based systems with a minimal area overhead of 2.37%.
Publisher
Association for Computing Machinery (ACM)
Subject
Electrical and Electronic Engineering,Hardware and Architecture,Software
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Hardware-Software Co-Design of a Collaborative DNN Accelerator for 3D Stacked Memories with Multi-Channel Data;2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC);2024-01-22
2. PreCog: Near-Storage Accelerator for Heterogeneous CNN Inference;2023 IEEE 34th International Conference on Application-specific Systems, Architectures and Processors (ASAP);2023-07
3. DDAM:
D
ata
D
istribution-
A
ware
M
apping of CNNs on Processing-In-Memory Systems;ACM Transactions on Design Automation of Electronic Systems;2023-03-19
4. A CNN Hardware Accelerator Using Triangle-based Convolution;ACM Journal on Emerging Technologies in Computing Systems;2022-10-13
5. A Near Memory Computing FPGA Architecture for Neural Network Acceleration;2022 2nd International Conference on Frontiers of Electronics, Information and Computation Technologies (ICFEICT);2022-08