Combining Weight Approximation, Sharing and Retraining for Neural Network Model Compression-Reference-Cited by-同舟云学术

Combining Weight Approximation, Sharing and Retraining for Neural Network Model Compression

Published:2024-09-11 Issue:6 Volume:23 Page:1-23
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Kashikar Prachi¹^ORCID,Sentieys Olivier²^ORCID,Sinha Sharad³^ORCID

Affiliation:

1. Indian Institute of Technology Goa, Ponda, India

2. INRIA, University of Rennes, Rennes, France

3. Computer Science, IIT Goa, Ponda, India

Abstract

Neural network model compression is very important to achieve model deployment based on the memory and storage available in different computing systems. Generally, the continuous drive for higher accuracy in these models increases their size and complexity, making it challenging to deploy them on resource-constrained computing environments. This article proposes various algorithms for model compression by exploiting weight characteristics and conducts an in-depth study of their performance. The algorithms involve manipulating exponents and mantissa in the floating-point representations of weights. In addition, we also present a retraining method that uses the proposed algorithms to further reduce the size of pre-trained models. The results presented in this article are mainly on BFloat16 floating-point format. The proposed weight manipulation algorithms save at least 20% of memory on state-of-the-art image classification models with very minor accuracy loss. This loss is bridged using the retraining method that saves at least 30% of memory, with potential memory savings of up to 43%. We compare the performance of the proposed methods against the state-of-the-art model compression techniques in terms of accuracy, memory savings, inference time, and energy.

Funder

DST-INRIA-CNRS

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3687466

Reference44 articles.

1. Xilinx. 2018. Zynq-7000 SoC Data Sheet: Overview. Retrieved February 28 2023 from https://www.mouser.com/datasheet/2/903/ds190-Zynq-7000-Overview-1595492.pdf

2. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16). 265–283. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi

3. Performance Analysis of Google Colaboratory as a Tool for Accelerating Deep Learning Applications

4. Miguel A. Carreira-Perpinán and Yerlan Idelbayev. 2018. “Learning-compression” algorithms for neural net pruning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8532–8541.

5. Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework