Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUs

Author:

Xu Weizhi1ORCID,Sun Yintai2ORCID,Fan Shengyu2ORCID,Yu Hui1ORCID,Fu Xin3ORCID

Affiliation:

1. Shandong Normal University and University of Houston

2. Shandong Normal University

3. University of Houston

Abstract

The convolutional neural network (CNN) is an important deep learning method, which is widely used in many fields. However, it is very time consuming to implement the CNN where convolution usually takes most of the time. There are many zero values in feature maps and filters, which leads to redundant calculations and memory accesses if dense methods are used to compute convolution. Many works recently have made use of sparsity to skip the calculations for zero values to reduce the inference time of the CNN. On the graphics processing unit platform, current works cannot fully exploit the sparsity of the feature map and achieve satisfactory performance. Therefore, we design a new parallel strategy to transform the feature map into a new storage format to avoid the redundant computation of zero values on graphics processing units. Also considering the sparsity in the feature map, we propose a fused storage format to combine the convolution operation with the following pooling operation, to further improve the performance. We carry out experiments with mainstream CNN models and achieve better performance compared with cuDNN and cuSPARSE. For VGG-19, ResNet-50, DenseNet-121, and RegNetX-16GF, 1.97×, 2.23×, 2.74×, and 1.58× speedups respectively are obtained over cuDNN. The speedups over cuSPARSE respectively are 2.10×, 1.83×, 2.35×, and 1.35× when only using the first method.

Funder

National Science Foundation

Natural Science Foundation of Shandong Province

National Natural Science Foundation of China

Funding for Study Abroad Program by the Government of Shandong Province

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Reference96 articles.

1. Martín Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado et al. 2016. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467 (2016).

2. Peter Ahrens Fredrik Kjolstad and Saman Amarasinghe. 2022. Autoscheduling for sparse tensor algebra with an asymptotic cost model. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI’22) . ACM New York NY 269–285.

3. Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffectual-neuron-free deep neural network computing. In Proceedings of the 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA’16). 1–13.

4. Manoj Alwani, Han Chen, Michael Ferdman, and Peter Milder. 2016. Fused-layer CNN accelerators. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). IEEE, Los Alamitos, CA, Article 22, 12 pages.

5. High performance convolution using sparsity and patterns for inference in deep convolutional neural networks;Amer Hossam;arXiv preprint arXiv:2104.08314,2021

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Winols: A Large-Tiling Sparse Winograd CNN Accelerator on FPGAs;ACM Transactions on Architecture and Code Optimization;2024-01-31

2. Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity;2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW);2023-10-02

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3