Multiply-and-Fire: An Event-Driven Sparse Neural Network Accelerator-Reference-Cited by-同舟云学术

Multiply-and-Fire: An Event-Driven Sparse Neural Network Accelerator

Published:2023-12-14 Issue:4 Volume:20 Page:1-26
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Yu Miao¹^ORCID,Xiang Tingting¹^ORCID,Miriyala Venkata Pavan Kumar¹^ORCID,Carlson Trevor E.¹^ORCID

Affiliation:

1. National University of Singapore, Singapore

Abstract

Deep neural network inference has become a vital workload for many systems from edge-based computing to data centers. To reduce the performance and power requirements for deep neural networks (DNNs) running on these systems, pruning is commonly used as a way to maintain most of the accuracy of the system while significantly reducing the workload requirements. Unfortunately, accelerators designed for unstructured pruning typically employ expensive methods to either determine non-zero activation-weight pairings or reorder computation. These methods require additional storage and memory accesses compared to the more regular data access patterns seen in structurally pruned models. However, even existing works that focus on the more regular access patterns seen in structured pruning continue to suffer from inefficient designs, which either ignore or expensively handle activation sparsity leading to low performance. To address these inefficiencies, we leverage structured pruning and propose the multiply-and-fire (MnF) technique, which aims to solve these problems in three ways: (a) the use of a novel event-driven dataflow that naturally exploits activation sparsity without complex, high-overhead logic; (b) an optimized dataflow takes an activation-centric approach, which aims to maximize the reuse of activation data in computation and ensures the data are only fetched once from off-chip global and on-chip local memory; and (c) based on the proposed event-driven dataflow, we develop an energy-efficient, high-performance sparsity-aware DNN accelerator. Our results show that our MnF accelerator achieves a significant improvement across a number of modern benchmarks and presents a new direction to enable highly efficient AI inference for both CNN and MLP workloads. Overall, this work achieves a geometric mean of 11.2× higher energy efficiency and 1.41× speedup compared to a state-of-the-art sparsity-aware accelerator.

Funder

A*STAR under its RIE2020 IAFICP

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3630255

Reference59 articles.

1. NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps

2. Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffectual-neuron-free deep neural network computing. In Proceedings of the ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA’16). 1–13.

3. Structured Pruning of Deep Convolutional Neural Networks

4. Kuo-Wei Chang and Tian-Sheuan Chang. 2019. VSCNN: Convolution neural network accelerator with vector sparsity. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’19). 1–5.

5. DianNao