LUT‐DSP usage trade‐off for re‐configurable convolution acceleration core based on small logarithmic floating point representation

Author:

Xiong Botao1ORCID,Fan Sheng1,He Xintong1,Zhou Zezhao1,Yang Runhua1,Li Sicun1ORCID,Shen Rensheng1,Chang Yuchun1

Affiliation:

1. School of Microelectronics Dalian University of Technology Dalian China

Abstract

AbstractThe challenge in designing the high‐performance field‐programmable gate array (FPGA)‐based convolution accelerator is to take full advantage of the on‐chip computing resources. The reported CNN accelerators always exhaust the on‐chip DSPs and leave other computing resources under‐utilized. Hence, this brief presents a novel convolution acceleration core based on the small logarithmic floating‐point (SLFP) format, which results in three contributions. (1) The SLFP<3,5> multiplier is only implemented with LUT6s and operates at 650 MHz with the aid of the carry chain, which provides sufficient accuracy for most CNNs. In addition, a similar structure can be used to implement a SLFP<3,5> divider. (2) The DSPs in the TWO24 SIMD mode are cascaded to implement a 9‐input adder tree. The sum of the multiples of elements (e.g., , ) is easily obtained by configuring the last DSP in the 9‐input adder tree in the accumulation mode, which can support more kernels (e.g., , ) with a high utilization rate (). (3) The convolution core based on the SLFP format only uses LUT6s and DSPs to achieve 1300 MOPS, 433 MOPS, and 81 MOPS for , , and kernel, respectively. In summary, the proposed convolution accelerator not only balances the resource usage of LUT6s and DSPs but also quantizes most CNN models using several simple scaling operations instead of a computing‐intensive retraining algorithm because the distribution of SLFP numbers is very similar to FP32 numbers.

Funder

Fundamental Research Funds for the Central Universities

National Natural Science Foundation of China

Publisher

Wiley

Subject

Applied Mathematics,Electrical and Electronic Engineering,Computer Science Applications,Electronic, Optical and Magnetic Materials

Reference16 articles.

1. [DL] A Survey of FPGA-based Neural Network Inference Accelerators

2. DSP-Efficient Hardware Acceleration of Convolutional Neural Network Inference on FPGAs

3. GholamiA KimS DongZ YaoZ MahoneyMW KeutzerK.A survey of quantization methods for efficient neural network inference. arXiv preprint arXiv:210313630;2021.

4. Xilinx.Deep learning with INT8 optimization on Xilinx devices (WP486);2016.

5. Xilinx.Convolutional neural network with INT4 optimization on Xilinx devices (WP521).2020.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3