Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing-Reference-Cited by-同舟云学术

Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing

Published:2018-09-30 Issue:3 Volume:15 Page:1-24
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Garland James¹^ORCID,Gregg David¹

Affiliation:

1. Trinity College Dublin and Trinity College Dublin, Ireland

Abstract

Convolutional neural networks (CNNs) are one of the most successful machine-learning techniques for image, voice, and video processing. CNNs require large amounts of processing capacity and memory bandwidth. Hardware accelerators have been proposed for CNNs that typically contain large numbers of multiply-accumulate (MAC) units, the multipliers of which are large in integrated circuit (IC) gate count and power consumption. “Weight-sharing” accelerators have been proposed where the full range of weight values in a trained CNN are compressed and put into bins, and the bin index is used to access the weight-shared value. We reduce power and area of the CNN by implementing parallel accumulate shared MAC (PASM) in a weight-shared CNN. PASM re-architects the MAC to instead count the frequency of each weight and place it in a bin. The accumulated value is computed in a subsequent multiply phase, significantly reducing gate count and power consumption of the CNN. In this article, we implement PASM in a weight-shared CNN convolution hardware accelerator and analyze its effectiveness. Experiments show that for a clock speed 1GHz implemented on a 45nm ASIC process our approach results in fewer gates, smaller logic, and reduced power with only a slight increase in latency. We also show that the same weight-shared-with-PASM CNN accelerator can be implemented in resource-constrained FPGAs, where the FPGA has limited numbers of digital signal processor (DSP) units to accelerate the MAC operations.

Funder

Science Foundation Ireland

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3233300

Reference22 articles.

1. DianNao

2. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks

Cited by 23 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Estimation of aquatic ecosystem health using deep neural network with nonlinear data mapping;Ecological Informatics;2024-07

2. An ASIC Accelerator for QNN With Variable Precision and Tunable Energy Efficiency;IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems;2024-07

3. Flexible Quantization for Efficient Convolutional Neural Networks;Electronics;2024-05-14

4. A Precision-Aware Neuron Engine for DNN Accelerators;SN Computer Science;2024-04-26

5. Modern Trends in Improving the Technical Characteristics of Devices and Systems for Digital Image Processing;IEEE Access;2024