Low-precision Floating-point Arithmetic for High-performance FPGA-based CNN Acceleration

Author:

Wu Chen1,Wang Mingyu1,Chu Xinyuan1,Wang Kun1,He Lei1

Affiliation:

1. Electrical and Computer Engineering, University of California, Westwood, Los Angeles, CA

Abstract

Low-precision data representation is important to reduce storage size and memory access for convolutional neural networks (CNNs). Yet, existing methods have two major limitations: (1) requiring re-training to maintain accuracy for deep CNNs and (2) needing 16-bit floating-point or 8-bit fixed-point for a good accuracy. In this article, we propose a low-precision (8-bit) floating-point (LPFP) quantization method for FPGA-based acceleration to overcome the above limitations. Without any re-training, LPFP finds an optimal 8-bit data representation with negligible top-1/top-5 accuracy loss (within 0.5%/0.3% in our experiments, respectively, and significantly better than existing methods for deep CNNs). Furthermore, we implement one 8-bit LPFP multiplication by one 4-bit multiply-adder and one 3-bit adder, and therefore implement four 8-bit LPFP multiplications using one DSP48E1 of Xilinx Kintex-7 family or DSP48E2 of Xilinx Ultrascale/Ultrascale+ family, whereas one DSP can implement only two 8-bit fixed-point multiplications. Experiments on six typical CNNs for inference show that on average, we improve throughput by over existing FPGA accelerators. Particularly for VGG16 and YOLO, compared to six recent FPGA accelerators, we improve average throughput by 3.5 and 27.5 and average throughput per DSP by 4.1 and 5 , respectively.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Cited by 20 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Refine to the essence: Less-redundant skill learning via diversity clustering;Engineering Applications of Artificial Intelligence;2024-07

2. A Case for Low Bitwidth Floating Point Arithmetic on FPGA for Transformer Based DNN Inference;2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW);2024-05-27

3. An FPGA-based Multi-Core Overlay Processor for Transformer-based Models;2024 2nd International Symposium of Electronics Design Automation (ISEDA);2024-05-10

4. A Highly Accurate and Parallel Vision MLP FPGA Accelerator based on FP7/8 SIMD Operations;2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC);2023-12-18

5. Optimizing FPGA-Based DCN Accelerator with On-Chip Dataflow Reordering and Serial-Parallel Computing Array;2023 International Conference on High Performance Big Data and Intelligent Systems (HDIS);2023-12-06

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3