SNICIT: Accelerating Sparse Neural Network Inference via Compression at Inference Time on GPU-Reference-Cited by-同舟云学术

SNICIT: Accelerating Sparse Neural Network Inference via Compression at Inference Time on GPU

Published:2023-08-07 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the 52nd International Conference on Parallel Processing
language:
Short-container-title:

Author:

Jiang Shui¹^ORCID,Huang Tsung-Wei²^ORCID,Yu Bei¹^ORCID,Ho Tsung-Yi¹^ORCID

Affiliation:

1. The Chinese University of Hong Kong, Hong Kong

2. The University of Wisconsin at Madison, United States of America

Funder

NSF (National Science Foundation)

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3605573.3605625

Reference43 articles.

1. Abien Fred Agarap . 2018. Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375 ( 2018 ). Abien Fred Agarap. 2018. Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375 (2018).

2. Vinod Nair Alex Krizhevsky and Geoffrey Hinton. [n. d.]. The CIFAR-10 dataset. https://www.cs.toronto.edu/ kriz/cifar.html Vinod Nair Alex Krizhevsky and Geoffrey Hinton. [n. d.]. The CIFAR-10 dataset. https://www.cs.toronto.edu/ kriz/cifar.html

3. Janki Bhimani , Miriam Leeser , and Ningfang Mi. 2015. Accelerating K-Means clustering with parallel implementations and GPU computing . In IEEE HPEC. 1–6. Janki Bhimani, Miriam Leeser, and Ningfang Mi. 2015. Accelerating K-Means clustering with parallel implementations and GPU computing. In IEEE HPEC. 1–6.

4. Mauro Bisson and Massimiliano Fatica . 2019. A GPU Implementation of the Sparse Deep Neural Network Graph Challenge . In IEEE HPEC. 1–8. Mauro Bisson and Massimiliano Fatica. 2019. A GPU Implementation of the Sparse Deep Neural Network Graph Challenge. In IEEE HPEC. 1–8.

5. Language models are few-shot learners;Brown Tom;NeurIPS,2020

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. GSAP: A GPU-Accelerated Stochastic Graph Partitioner;Proceedings of the 53rd International Conference on Parallel Processing;2024-08-12

2. The Impact of Uniform Inputs on Activation Sparsity and Energy-Latency Attacks in Computer Vision;2024 IEEE Security and Privacy Workshops (SPW);2024-05-23

3. Parallel and Heterogeneous Timing Analysis: Partition, Algorithm, and System;Proceedings of the 2024 International Symposium on Physical Design;2024-03-12

4. A Resource-efficient Task Scheduling System using Reinforcement Learning : Invited Paper;2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC);2024-01-22

5. An Efficient Task-Parallel Pipeline Programming Framework;Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region;2024-01-18