Harmonious Coexistence of Structured Weight Pruning and Ternarization for Deep Neural Networks-Reference-Cited by-同舟云学术

Harmonious Coexistence of Structured Weight Pruning and Ternarization for Deep Neural Networks

Published:2020-04-03 Issue:04 Volume:34 Page:6623-6630
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Yang Li,He Zhezhi,Fan Deliang

Abstract

Deep convolutional neural network (DNN) has demonstrated phenomenal success and been widely used in many computer vision tasks. However, its enormous model size and high computing complexity prohibits its wide deployment into resource limited embedded system, such as FPGA and mGPU. As the two most widely adopted model compression techniques, weight pruning and quantization compress DNN model through introducing weight sparsity (i.e., forcing partial weights as zeros) and quantizing weights into limited bit-width values, respectively. Although there are works attempting to combine the weight pruning and quantization, we still observe disharmony between weight pruning and quantization, especially when more aggressive compression schemes (e.g., Structured pruning and low bit-width quantization) are used. In this work, taking FPGA as the test computing platform and Processing Elements (PE) as the basic parallel computing unit, we first propose a PE-wise structured pruning scheme, which introduces weight sparsification with considering of the architecture of PE. In addition, we integrate it with an optimized weight ternarization approach which quantizes weights into ternary values ({-1,0,+1}), thus converting the dominant convolution operations in DNN from multiplication-and-accumulation (MAC) to addition-only, as well as compressing the original model (from 32-bit floating point to 2-bit ternary representation) by at least 16 times. Then, we investigate and solve the coexistence issue between PE-wise Structured pruning and ternarization, through proposing a Weight Penalty Clipping (WPC) technique with self-adapting threshold. Our experiment shows that the fusion of our proposed techniques can achieve the best state-of-the-art ∼21× PE-wise structured compression rate with merely 1.74%/0.94% (top-1/top-5) accuracy degradation of ResNet-18 on ImageNet dataset.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. On-Chip-Registration Supported ASR Processor Using Two-Step Transfer-Learning TWN;IEEE Transactions on Circuits and Systems II: Express Briefs;2024-05

2. Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning;IEEE Open Journal of Circuits and Systems;2024

3. Recent Advances and Future Prospects for Memristive Materials, Devices, and Systems;ACS Nano;2023-06-29

4. Digital-Assisted Analog In-Memory Computing with RRAM Devices;2023 International VLSI Symposium on Technology, Systems and Applications (VLSI-TSA/VLSI-DAT);2023-04-17

5. Structured Bayesian Compression for Deep Neural Networks Based on the Turbo-VBI Approach;IEEE Transactions on Signal Processing;2023