DiffQuant: Reducing Compression Difference for Neural Network Quantization-Reference-Cited by-同舟云学术

DiffQuant: Reducing Compression Difference for Neural Network Quantization

Published:2023-12-12 Issue:24 Volume:12 Page:4972
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Zhang Ming¹²³^ORCID,Xu Jian¹²³,Li Weijun¹²³,Ning Xin¹²³^ORCID

Affiliation:

1. AnnLab, Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China

2. Beijing Key Laboratory of Semiconductor Neural Network Intelligent Sensing and Computing Technology, Beijing 100083, China

3. College of Materials Science and Opto-Electronic Technology & School of Integrated Circuits, University of Chinese Academy of Sciences, Beijing 100049, China

Abstract

Deep neural network quantization is a widely used method in the deployment of mobile or edge devices to effectively reduce memory overhead and speed up inference. However, quantization inevitably leads to a reduction in the performance and equivalence of models. Moreover, access to labeled datasets is often denied as they are considered valuable assets for companies or institutes. Consequently, performing quantization training becomes challenging without sufficient labeled datasets. To address these issues, we propose a novel quantization pipeline named DiffQuant, which can perform quantization training using unlabeled datasets. The pipeline includes two cores: the compression difference (CD) and model compression loss (MCL). The CD can measure the degree of equivalence loss between the full-precision and quantized models, and the MCL supports fine-tuning the quantized models using unlabeled data. In addition, we design a quantization training scheme that allows the quantization of both the batch normalization (BN) layer and the bias. Experimental results show that our method outperforms state-of-the-art methods on ResNet18/34/50 networks, maintaining performance with a reduced CD. We achieve Top-1 accuracies of 70.08%, 74.11%, and 76.16% on the ImageNet dataset for the 8-bit quantized ResNet18/34/50 models and reduce the gap to 0.55%, 0.61%, and 0.71% with the full-precision network, respectively. We achieve CD values of only 7.45%, 7.48%, and 8.52%, which allows DiffQuant to further exploit the potential of quantization.

Funder

Key-Area Research and Development Program of Guangdong Province

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/24/4972/pdf

Reference50 articles.

1. Recent advances in 3D object detection based on RGB-D: A survey;Wang;Displays,2021

2. Occluded person re-identification with deep learning: A survey and perspectives;Ning;Expert Syst. Appl.,2023

3. He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27–30). Deep residual learning for image recognition. Proceedings of the Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.

4. Multi-angle head pose classification with masks based on color texture analysis and stack generalization;Li;Concurr. Comput. Pract. Exp.,2023

5. A survey on few-shot class-incremental learning;Tian;Neural Netw.,2024