Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration-Reference-Cited by-同舟云学术

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

Published:2022-06-06 Issue:5 Volume:27 Page:1-26
ISSN:1084-4309
Container-title:ACM Transactions on Design Automation of Electronic Systems
language:en
Short-container-title:ACM Trans. Des. Autom. Electron. Syst.

Author:

Gong Yifan¹,Yuan Geng¹^ORCID,Zhan Zheng¹,Niu Wei²,Li Zhengang¹,Zhao Pu¹,Cai Yuxuan¹,Liu Sijia³,Ren Bin²,Lin Xue¹,Tang Xulong⁴,Wang Yanzhi¹

Affiliation:

1. Northeastern University, Boston, MA

2. College of William and Mary, Williamsburg, VA

3. Michigan State University, East Lansing, MI

4. University of Pittsburgh, Pittsburgh, PA

Abstract

Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices. However, prior pruning schemes have limited application scenarios due to accuracy degradation, difficulty in leveraging hardware acceleration, and/or restriction on certain types of DNN layers. In this article, we propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations that are applicable to any type of DNN layer while achieving high accuracy and hardware inference performance. With the flexibility of applying different pruning schemes to different layers enabled by our compiler optimizations, we further probe into the new problem of determining the best-suited pruning scheme considering the different acceleration and accuracy performance of various pruning schemes. Two pruning scheme mapping methods—one -search based and the other is rule based—are proposed to automatically derive the best-suited pruning regularity and block size for each layer of any given DNN. Experimental results demonstrate that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework with up to 2.48

\( \times \)

and 1.73

\( \times \)

DNN inference acceleration on CIFAR-10 and ImageNet datasets without accuracy loss.

Publisher

Association for Computing Machinery (ACM)

Subject

Electrical and Electronic Engineering,Computer Graphics and Computer-Aided Design,Computer Science Applications

Link

https://dl.acm.org/doi/pdf/10.1145/3495532

Reference88 articles.

1. TensorFlow. n.d. TensorFlow Lite. Retrieved March 2 2022 from https://github.com/tensorflow/tflite-support.

2. GitHub. n.d. alibaba/MNN. Retrieved March 2 2022 from https://github.com/alibaba/MNN.

3. PyTorch. n.d. PyTorch Mobile. Retrieved March 2 2022 from https://pytorch.org/mobile/home.

4. On optimizing machine learning workloads via kernel fusion

5. Julia: A Fresh Approach to Numerical Computing

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MOC: Multi-Objective Mobile CPU-GPU Co-Optimization for Power-Efficient DNN Inference;2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD);2023-10-28

2. CoFB: latency-constrained co-scheduling of flows and batches for deep learning inference service on the CPU–GPU system;The Journal of Supercomputing;2023-04-04

3. Compiler-Aware Neural Architecture Search for On-Mobile Real-time Super-Resolution;Lecture Notes in Computer Science;2022