Efficient Layer-Wise N:M Sparse CNN Accelerator with Flexible SPEC: Sparse Processing Element Clusters-Reference-Cited by-同舟云学术

Efficient Layer-Wise N:M Sparse CNN Accelerator with Flexible SPEC: Sparse Processing Element Clusters

Published:2023-02-24 Issue:3 Volume:14 Page:528
ISSN:2072-666X
Container-title:Micromachines
language:en
Short-container-title:Micromachines

Author:

Xie Xiaoru¹^ORCID,Zhu Mingyu¹,Lu Siyuan¹^ORCID,Wang Zhongfeng¹

Affiliation:

1. School of Electronic Science and Engineering, Nanjing University, Nanjing 210023, China

Abstract

Recently, the layer-wise N:M fine-grained sparse neural network algorithm (i.e., every M-weights contains N non-zero values) has attracted tremendous attention, as it can effectively reduce the computational complexity with negligible accuracy loss. However, the speed-up potential of this algorithm will not be fully exploited if the right hardware support is lacking. In this work, we design an efficient accelerator for the N:M sparse convolutional neural networks (CNNs) with layer-wise sparse patterns. First, we analyze the performances of different processing element (PE) structures and extensions to construct the flexible PE architecture. Second, the variable sparse convolutional dimensions and sparse ratios are involved in the hardware design. With a sparse PE cluster (SPEC) design, the hardware can efficiently accelerate CNNs with the layer-wise N:M pattern. Finally, we employ the proposed SPEC into the CNN accelerator with flexible network-on-chip and specially designed dataflow. We implement hardware accelerators on Xilinx ZCU102 FPGA and Xilinx VCU118 FPGA and evaluate them with classical CNNs such as Alexnet, VGG-16, and ResNet-50. Compared with existing accelerators designed for structured and unstructured pruned networks, our design achieves the best performance in terms of power efficiency.

Funder

National Natural Science Foundation of China

High-Level Personnel Project of Jiangsu Province

Nanjing University

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Mechanical Engineering,Control and Systems Engineering

Link

https://www.mdpi.com/2072-666X/14/3/528/pdf

Reference25 articles.

1. Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3–6). ImageNet Classification with Deep Convolutional Neural Networks. Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems (NeurIps), Lake Tahoe, NV, USA.

2. Simonyan, K., and Zisserman, A. (2015, January 7–9). Very Deep Convolutional Networks for Large-Scale Image Recognition. Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA, USA.

3. He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27–30). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA.

4. A Mixed-Pruning Based Framework for Embedded Convolutional Neural Network Acceleration;Chang;IEEE Trans. Circuits Syst. I Regul. Pap.,2021

5. Han, S., Mao, H., and Dally, W.J. (2016, January 2–4). Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. Proceedings of the 4th International Conference on Learning Representations (ICLR 2016), San Juan, Puerto Rico.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. RAMAN: A Reconfigurable and Sparse tinyML Accelerator for Inference on Edge;IEEE Internet of Things Journal;2024-07-15

2. Efficient N:M Sparse DNN Training Using Algorithm, Architecture, and Dataflow Co-Design;IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems;2024-02

3. Editorial for the Beyond Moore’s Law: Hardware Specialization and Advanced System on Chip;Micromachines;2023-08-11