Quantization-Based Optimization Algorithm for Hardware Implementation of Convolution Neural Networks-Reference-Cited by-同舟云学术

Quantization-Based Optimization Algorithm for Hardware Implementation of Convolution Neural Networks

Published:2024-04-30 Issue:9 Volume:13 Page:1727
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Mohd Bassam J.¹^ORCID,Ahmad Yousef Khalil M.¹^ORCID,AlMajali Anas¹^ORCID,Hayajneh Thaier²^ORCID

Affiliation:

1. Department of Computer Engineering, Faculty of Engineering, The Hashemite University, Zarqa 13133, Jordan

2. Department of Computer and Information Sciences, Fordham University, New York, NY 10023, USA

Abstract

Convolutional neural networks (CNNs) have demonstrated remarkable performance in many areas but require significant computation and storage resources. Quantization is an effective method to reduce CNN complexity and implementation. The main research objective is to develop a scalable quantization algorithm for CNN hardware design and model the performance metrics for the purpose of CNN implementation in resource-constrained devices (RCDs) and optimizing layers in deep neural networks (DNNs). The algorithm novelty is based on blending two quantization techniques to perform full model quantization with optimum accuracy, and without additional neurons. The algorithm is applied to a selected CNN model and implemented on an FPGA. Implementing CNN using broad data is not possible due to capacity issues. With the proposed quantization algorithm, we succeeded in implementing the model on the FPGA using 16-, 12-, and 8-bit quantization. Compared to the 16-bit design, the 8-bit design offers a 44% decrease in resource utilization, and achieves power and energy reductions of 41% and 42%, respectively. Models show that trading off one quantization bit yields savings of approximately 5.4K LUTs, 4% logic utilization, 46.9 mW power, and 147 μJ energy. The models were also used to estimate performance metrics for a sample DNN design.

Publisher

MDPI AG

Link

https://www.mdpi.com/2079-9292/13/9/1727/pdf

Reference68 articles.

1. A Survey of Field Programmable Gate Array-Based Convolutional Neural Network Accelerators;Zhang;Int. J. Electron. Commun. Eng.,2020

2. A survey of FPGA-based accelerators for convolutional neural networks;Mittal;Neural Comput. Appl.,2020

3. Gholami, A., Kim, S., Dong, Z., Yao, Z., Mahoney, M.W., and Keutzer, K. (2021). A survey of quantization methods for efficient neural network inference. arXiv.

4. Kim, Y.D., Park, E., Yoo, S., Choi, T., Yang, L., and Shin, D. (2015). Compression of deep convolutional neural networks for fast and low power mobile applications. arXiv.

5. Qiu, J., Wang, J., Yao, S., Guo, K., Li, B., Zhou, E., Yu, J., Tang, T., Xu, N., and Song, S. (2016, January 21–23). Going deeper with embedded fpga platform for convolutional neural network. Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA.