Custom Network Quantization Method for Lightweight CNN Acceleration on FPGAs-Reference-Cited by-同舟云学术

Custom Network Quantization Method for Lightweight CNN Acceleration on FPGAs

Published:2024-04-02 Issue: Volume:2024 Page:1-11
ISSN:1550-1477
Container-title:International Journal of Distributed Sensor Networks
language:en
Short-container-title:International Journal of Distributed Sensor Networks

Author:

Yi Lingjie¹^ORCID,Xie Xianzhong¹^ORCID,Wan Yi¹,Jiang Bo¹^ORCID,Chen Junfan²^ORCID

Affiliation:

1. School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China

2. Chongqing Haiyun Jiexun Technology, Chongqing, China

Abstract

The low-bit quantization can effectively reduce the deep neural network storage as well as the computation costs. Existing quantization methods have yielded unsatisfactory results when being applied to lightweight networks. Additionally, following network quantization, the differences in data types between the operators can cause issues when deploying networks on Field Programmable Gate Arrays (FPGAs). Moreover, some operators cannot be accelerated heterogeneously on FPGAs, resulting in frequent switching between the Advanced RISC Machine (ARM) and FPGA environments for computation tasks. To address these problems, this paper proposes a custom network quantization approach. Firstly, an improved PArameterized Clipping Activation (PACT) method is employed during the quantization aware training to restrict the value range of neural network parameters and reduce the loss of precision arising from quantization. Secondly, the Consecutive Execution Of Convolution Operators (CEOCO) strategy is utilized to mitigate the resource consumption caused by the frequent environment switching. The proposed approach is validated on Xilinx Zynq Ultrascale+MPSoC 3EG and Virtex UltraScale+XCVU13P platforms. The MobileNetv1, MobileNetv3, PPLCNet, and PPLCNetv2 networks were utilized as testbeds for the validation. Moreover, experimental results are on the miniImageNet, CIFAR-10, and OxFord 102 Flowers public datasets. In comparison to the original model, the proposed optimization methods result in an average decrease of 1.2% in accuracy. Compared to conventional quantization method, the accuracy remains almost unchanged, while the frames per second (FPS) on FPGAs improves by an average of 2.1 times.

Funder

Technological Innovation and Application Development of Chongqing

Publisher

Hindawi Limited

Link

http://downloads.hindawi.com/journals/dsn/2024/8018810.pdf

Reference29 articles.

1. Weighted Adaptive Image Super-Resolution Scheme Based on Local Fractal Feature and Image Roughness

2. Improving the Reliability of Deep Neural Networks in NLP: A Review

3. Low-Cost CNN for Automatic Violence Recognition on Embedded System

4. A lightweight CNN-based algorithm and implementation on embedded system for real-time face recognition