Multiply-and-Fire: An Event-Driven Sparse Neural Network Accelerator

Author:

Yu Miao1ORCID,Xiang Tingting1ORCID,Miriyala Venkata Pavan Kumar1ORCID,Carlson Trevor E.1ORCID

Affiliation:

1. National University of Singapore, Singapore

Abstract

Deep neural network inference has become a vital workload for many systems from edge-based computing to data centers. To reduce the performance and power requirements for deep neural networks (DNNs) running on these systems, pruning is commonly used as a way to maintain most of the accuracy of the system while significantly reducing the workload requirements. Unfortunately, accelerators designed for unstructured pruning typically employ expensive methods to either determine non-zero activation-weight pairings or reorder computation. These methods require additional storage and memory accesses compared to the more regular data access patterns seen in structurally pruned models. However, even existing works that focus on the more regular access patterns seen in structured pruning continue to suffer from inefficient designs, which either ignore or expensively handle activation sparsity leading to low performance. To address these inefficiencies, we leverage structured pruning and propose the multiply-and-fire (MnF) technique, which aims to solve these problems in three ways: (a) the use of a novel event-driven dataflow that naturally exploits activation sparsity without complex, high-overhead logic; (b) an optimized dataflow takes an activation-centric approach, which aims to maximize the reuse of activation data in computation and ensures the data are only fetched once from off-chip global and on-chip local memory; and (c) based on the proposed event-driven dataflow, we develop an energy-efficient, high-performance sparsity-aware DNN accelerator. Our results show that our MnF accelerator achieves a significant improvement across a number of modern benchmarks and presents a new direction to enable highly efficient AI inference for both CNN and MLP workloads. Overall, this work achieves a geometric mean of 11.2× higher energy efficiency and 1.41× speedup compared to a state-of-the-art sparsity-aware accelerator.

Funder

A*STAR under its RIE2020 IAFICP

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Reference59 articles.

1. NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps

2. Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffectual-neuron-free deep neural network computing. In Proceedings of the ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA’16). 1–13.

3. Structured Pruning of Deep Convolutional Neural Networks

4. Kuo-Wei Chang and Tian-Sheuan Chang. 2019. VSCNN: Convolution neural network accelerator with vector sparsity. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’19). 1–5.

5. DianNao

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3