AE-Qdrop: Towards Accurate and Efficient Low-Bit Post-Training Quantization for A Convolutional Neural Network
-
Published:2024-02-04
Issue:3
Volume:13
Page:644
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Li Jixing123ORCID, Chen Gang123, Jin Min123, Mao Wenyu123ORCID, Lu Huaxiang123
Affiliation:
1. Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China 2. University of Chinese Academy of Sciences, Beijing 100089, China 3. Beijing Key Lab of Semiconductor Neural Network Intelligent Perception and Computing Technology, Beijing 100083, China
Abstract
Blockwise reconstruction with adaptive rounding helps achieve acceptable 4-bit post-training quantization accuracy. However, adaptive rounding is time intensive, and the optimization space of weight elements is constrained to a binary set, thus limiting the performance of quantized models. The optimality of block-wise reconstruction requires that subsequent network blocks remain unquantized. To address this, we propose a two-stage post-training quantization scheme, AE-Qdrop, encompassing block-wise reconstruction and global fine-tuning. In the block-wise reconstruction stage, a progressive optimization strategy is introduced as a replacement for adaptive rounding, enhancing both quantization accuracy and efficiency. Additionally, the integration of randomly weighted quantized activation helps mitigate the risk of overfitting. In the global fine-tuning stage, the weights of each quantized network block are corrected simultaneously through logit matching and feature matching. Experiments in image classification and object detection tasks validate that AE-Qdrop achieves high precision and efficient quantization. For the 2-bit MobileNetV2, AE-Qdrop outperforms Qdrop in quantization accuracy by 6.26%, and its quantization efficiency is fivefold higher.
Funder
National Natural Science Foundation of China CAS Strategic Leading Science and Technology Project
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference38 articles.
1. Zhu, A., Wang, B., Xie, J., and Ma, C. (2023). Lightweight Tunnel Defect Detection Algorithm Based on Knowledge Distillation. Electronics, 12. 2. KD-PAR: A knowledge distillation-based pedestrian attribute recognition model with multi-label mixed feature learning network;Wu;Expert Syst. Appl.,2024 3. Manas: Multi-agent neural architecture search;Lopes;Mach. Learn.,2024 4. Song, Y., Wang, A., Zhao, Y., Wu, H., and Iwahori, Y. (2023). Multi-Scale Spatial–Spectral Attention-Based Neural Architecture Search for Hyperspectral Image Classification. Electronics, 12. 5. Li, Y., Adamczewski, K., Li, W., Gu, S., Timofte, R., and Van Gool, L. (2022, January 18–24). Revisiting random channel pruning for neural network compression. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.
|
|