Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUs-Reference-Cited by-同舟云学术

Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUs

Published:2023-07-19 Issue:3 Volume:20 Page:1-26
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Xu Weizhi¹^ORCID,Sun Yintai²^ORCID,Fan Shengyu²^ORCID,Yu Hui¹^ORCID,Fu Xin³^ORCID

Affiliation:

1. Shandong Normal University and University of Houston

2. Shandong Normal University

3. University of Houston

Abstract

The convolutional neural network (CNN) is an important deep learning method, which is widely used in many fields. However, it is very time consuming to implement the CNN where convolution usually takes most of the time. There are many zero values in feature maps and filters, which leads to redundant calculations and memory accesses if dense methods are used to compute convolution. Many works recently have made use of sparsity to skip the calculations for zero values to reduce the inference time of the CNN. On the graphics processing unit platform, current works cannot fully exploit the sparsity of the feature map and achieve satisfactory performance. Therefore, we design a new parallel strategy to transform the feature map into a new storage format to avoid the redundant computation of zero values on graphics processing units. Also considering the sparsity in the feature map, we propose a fused storage format to combine the convolution operation with the following pooling operation, to further improve the performance. We carry out experiments with mainstream CNN models and achieve better performance compared with cuDNN and cuSPARSE. For VGG-19, ResNet-50, DenseNet-121, and RegNetX-16GF, 1.97×, 2.23×, 2.74×, and 1.58× speedups respectively are obtained over cuDNN. The speedups over cuSPARSE respectively are 2.10×, 1.83×, 2.35×, and 1.35× when only using the first method.

Funder

National Science Foundation

Natural Science Foundation of Shandong Province

National Natural Science Foundation of China

Funding for Study Abroad Program by the Government of Shandong Province

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3600092

Reference96 articles.

1. Martín Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado et al. 2016. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467 (2016).

2. Peter Ahrens Fredrik Kjolstad and Saman Amarasinghe. 2022. Autoscheduling for sparse tensor algebra with an asymptotic cost model. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI’22) . ACM New York NY 269–285.

3. Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffectual-neuron-free deep neural network computing. In Proceedings of the 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA’16). 1–13.

4. Manoj Alwani, Han Chen, Michael Ferdman, and Peter Milder. 2016. Fused-layer CNN accelerators. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). IEEE, Los Alamitos, CA, Article 22, 12 pages.

5. High performance convolution using sparsity and patterns for inference in deep convolutional neural networks;Amer Hossam;arXiv preprint arXiv:2104.08314,2021

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MiniTomatoNet: a lightweight CNN for tomato leaf disease recognition on heterogeneous FPGA-SoC;The Journal of Supercomputing;2024-06-17

2. Winols: A Large-Tiling Sparse Winograd CNN Accelerator on FPGAs;ACM Transactions on Architecture and Code Optimization;2024-03-23

3. Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity;2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW);2023-10-02