Two Novel Non-Uniform Quantizers with Application in Post-Training Quantization-Reference-Cited by-同舟云学术

Two Novel Non-Uniform Quantizers with Application in Post-Training Quantization

Published:2022-09-21 Issue:19 Volume:10 Page:3435
ISSN:2227-7390
Container-title:Mathematics
language:en
Short-container-title:Mathematics

Author:

Perić Zoran,Aleksić Danijela^ORCID,Nikolić Jelena^ORCID,Tomić Stefan

Abstract

With increased network downsizing and cost minimization in deployment of neural network (NN) models, the utilization of edge computing takes a significant place in modern artificial intelligence today. To bridge the memory constraints of less-capable edge systems, a plethora of quantizer models and quantization techniques are proposed for NN compression with the goal of enabling the fitting of the quantized NN (QNN) on the edge device and guaranteeing a high extent of accuracy preservation. NN compression by means of post-training quantization has attracted a lot of research attention, where the efficiency of uniform quantizers (UQs) has been promoted and heavily exploited. In this paper, we propose two novel non-uniform quantizers (NUQs) that prudently utilize one of the two properties of the simplest UQ. Although having the same quantization rule for specifying the support region, both NUQs have a different starting setting in terms of cell width, compared to a standard UQ. The first quantizer, named the simplest power-of-two quantizer (SPTQ), defines the width of cells that are multiplied by the power of two. As it is the case in the simplest UQ design, the representation levels of SPTQ are midpoints of the quantization cells. The second quantizer, named the modified SPTQ (MSPTQ), is a more competitive quantizer model, representing an enhanced version of SPTQ in which the quantizer decision thresholds are centered between the nearest representation levels, similar to the UQ design. These properties make the novel NUQs relatively simple. Unlike UQ, the quantization cells of MSPTQ are not of equal widths and the representation levels are not midpoints of the quantization cells. In this paper, we describe the design procedure of SPTQ and MSPTQ and we perform their optimization for the assumed Laplacian source. Afterwards, we perform post-training quantization by implementing SPTQ and MSPTQ, study the viability of QNN accuracy and show the implementation benefits over the case where UQ of an equal number of quantization cells is utilized in QNN for the same classification task. We believe that both NUQs are particularly substantial for memory-constrained environments, where simple and acceptably accurate solutions are of crucial importance.

Publisher

MDPI AG

Subject

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2227-7390/10/19/3435/pdf

Reference33 articles.

1. Number of Internet of Things (IoT) Connected Devices Worldwide in 2018, 2025 and 2030 https://www.statista.com/statistics/802690/worldwide-connected-devices-by-accesstechnology

2. Distributed deep neural networks over the cloud, the edge and end devices;Teerapittayanon;Proceedings of the 37th IEEE International Conference on Distributed Computing Systems (ICDCS),2017

3. Moving Deep Learning to the Edge

4. Bringing AI to edge: From deep learning’s perspective

5. A Survey of Quantization Methods for Efficient Neural Network Inference;Gholami;arXiv,2021