WRA-MF: A Bit-Level Convolutional-Weight-Decomposition Approach to Improve Parallel Computing Efficiency for Winograd-Based CNN Acceleration-Reference-Cited by-同舟云学术

WRA-MF: A Bit-Level Convolutional-Weight-Decomposition Approach to Improve Parallel Computing Efficiency for Winograd-Based CNN Acceleration

Published:2023-12-08 Issue:24 Volume:12 Page:4943
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Xiang Siwei¹,Lv Xianxian¹,Meng Yishuo¹,Wang Jianfei¹,Lu Cimang²,Yang Chen¹^ORCID

Affiliation:

1. The School of Microelectronics, Xi’an Jiaotong University, Xi’an 710049, China

2. Shenzhen Xinrai Sinovoice Technology Co., Ltd., Shenzhen 518000, China

Abstract

FPGA-based convolutional neural network (CNN) accelerators have been extensively studied recently. To exploit the parallelism of multiplier–accumulator computation in convolution, most FPGA-based CNN accelerators heavily depend on the number of on-chip DSP blocks in the FPGA. Consequently, the performance of the accelerators is restricted by the limitation of the DSPs, leading to an imbalance in the utilization of other FPGA resources. This work proposes a multiplication-free convolutional acceleration scheme (named WRA-MF) to relax the pressure on the required DSP resources. Firstly, the proposed WRA-MF employs the Winograd algorithm to reduce the computational density, and it then performs bit-level convolutional weight decomposition to eliminate the multiplication operations. Furthermore, by extracting common factors, the complexity of the addition operations is reduced. Experimental results on the Xilinx XCVU9P platform show that the WRA-MF can achieve 7559 GOP/s throughput at a 509 MHz clock frequency for VGG16. Compared with state-of-the-art works, the WRA-MF achieves up to a 3.47×–27.55× area efficiency improvement. The results indicate that the proposed architecture achieves a high area efficiency while ameliorating the imbalance in the resource utilization.

Funder

National Natural Science Foundation of China

Shenzhen Park of Hetao Shenzhen–Hong Kong Science and Technology Innovation Cooperation Zone Program

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/24/4943/pdf

Reference34 articles.

1. Wodzinski, M., Skalski, A., Hemmerling, D., Orozco-Arroyave, J.R., and Nöth, E. (2019, January 23–27). Deep Learning Approach to Parkinson’s Disease Detection Using Voice Recordings and Convolutional Neural Network Dedicated to Image Classification. Proceedings of the 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany.

2. He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27–30). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.

3. Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23–28). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA.

4. Afdhal, A., Nasaruddin, N., Fuadi, Z., Sugiarto, S., Riza, H., and Saddami, K. (2022, January 10–11). Evaluation of Benchmarking Pre-Trained CNN Model for Autonomous Vehicles Object Detection in Mixed Traffic. Proceedings of the 2022 International Conference on ICT for Smart Society (ICISS), Bandung, Indonesia.

5. Yang, T.-J., Chen, Y.-H., and Sze, V. (2017, January 21–26). Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.