Affiliation:
1. School of Microelectronics, Dalian University of Technology, Dalian 116024, China
2. Key Laboratory of Intelligent Control and Optimization for Industrial Equipment, Ministry of Education, Dalian University of Technology, Dalian 116024, China
Abstract
Convolution forms one of the most essential operations for the FPGA-based hardware accelerator. However, the existing designs often neglect the inherent architecture of FPGA, which puts forward an austere challenge on hardware resource. Even though some previous works have proposed approximate multipliers or convolution acceleration algorithms to deal with this issue, the inevitable accuracy loss and resource occupation easily lead to performance degradation. Toward this, we first propose two kinds of resource-efficient optimized accurate multipliers based on LUTs or carry chains. Then, targeting FPGA-based platforms, a generic multiply–accumulate structure is constructed by directly accumulating the partial products produced by our proposed optimized radix-4 Booth multipliers without intermediate multiplication and addition results. Experimental results demonstrate that our proposed multiplier achieves a maximum 51% look-up-table (LUT) reduction compared to the Vivado area-optimized multiplier IP. Furthermore, the convolutional process unit using the proposed structure achieves a 36% LUT reduction compared to existing methods. As case studies, the proposed method is applied to DCT transform, LeNet, and MobileNet-V3 to achieve hardware resource saving without loss of accuracy.
Funder
National Science and Technology Major Project
Aeronautical Science Foundation of China
Science and Technology Innovation Foundation of Dalian
Fundamental Research Funds for the Central Universities
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference27 articles.
1. A survey of FPGA-based accelerators for convolutional neural networks;Mittal;Neural Comput. Appl.,2020
2. DSP-efficient hardware acceleration of convolutional neural network inference on FPGAs;Wang;IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.,2020
3. Ullah, S., Sripadra, S., Murthy, J., and Kumar, A. (2018, January 24–28). SMApproxLib: Library of FPGA-based approximate multipliers. Proceedings of the IEEE Design Automation Conference (DAC), San Francisco, CA, USA.
4. (2023, July 21). Xilinx LogiCORE IP v12.0. Available online: https://www.xilinx.com/support/documentation/ip_documentation/mult_gen/v12_0/pg108-mult-gen.pdf.
5. Lentaris, G. (2020, January 23–25). Combining arithmetic approximation techniques for improved CNN circuit design. Proceedings of the IEEE International Conference on Electronics, Circuits and Systems (ICECS), Glasgow, UK.