FPGA Logic Block Architectures for Efficient Deep Learning Inference-Reference-Cited by-同舟云学术

FPGA Logic Block Architectures for Efficient Deep Learning Inference

Published:2020-09-30 Issue:3 Volume:13 Page:1-34
ISSN:1936-7406
Container-title:ACM Transactions on Reconfigurable Technology and Systems
language:en
Short-container-title:ACM Trans. Reconfigurable Technol. Syst.

Author:

Eldafrawy Mohamed¹^ORCID,Boutros Andrew¹^ORCID,Yazdanshenas Sadegh¹^ORCID,Betz Vaughn¹

Affiliation:

1. University of Toronto, Toronto, Ontario, Canada

Abstract

Reducing the precision of deep neural network (DNN) inference accelerators can yield large efficiency gains with little or no accuracy degradation compared to half or single precision floating-point by enabling more multiplication operations per unit area. A wide range of precisions fall on the pareto-optimal curve of hardware efficiency vs. accuracy with no single precision dominating, making the variable precision capabilities of FPGAs very valuable. We propose three types of logic block architectural enhancements and fully evaluate a total of six architectures that improve the area efficiency of multiplications and additions implemented in the soft fabric. Increasing the LUT fracturability and adding two adders to the ALM (4-bit Adder Double Chain architecture) leads to a 1.5× area reduction for arithmetic heavy machine learning (ML) kernels, while increasing their speed. In addition, this architecture also reduces the logic area of general applications by 6%, while increasing the critical path delay by only 1%. However, our highest impact option, which adds a 9-bit shadow multiplier to the logic clusters, reduces the area and critical path delay of ML kernels by 2.4× and 1.2×, respectively. These large gains come at a cost of 15% logic area increase for general applications.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3393668

Reference42 articles.

1. Y. Cao. 2018. Predictive Technology Model (PTM). Retrieved from http://ptm.asu.edu/. Y. Cao. 2018. Predictive Technology Model (PTM). Retrieved from http://ptm.asu.edu/.

2. The effect of LUT and cluster size on deep-submicron FPGA performance and density

3. A Two's Complement Parallel Array Multiplication Algorithm

4. V. Betz and J. Rose. 1998. How much logic should go in an FPGA logic block. IEEE Design 8 Test of Computers 15 1 (1998) 10--15. V. Betz and J. Rose. 1998. How much logic should go in an FPGA logic block. IEEE Design 8 Test of Computers 15 1 (1998) 10--15.

Cited by 21 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs;2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM);2024-05-05

2. An optimized FPGA architecture for machine learning applications;AEU - International Journal of Electronics and Communications;2024-01

3. ASMPEC: Approximate-Sum-Based Mapping of Partial Products With Error Correction for Softcore Multipliers on FPGAs;IEEE Transactions on Circuits and Systems II: Express Briefs;2023-12

4. A1RL: Approximate 1-Row-LUT-Based Low-Power Signed Multipliers for DSP and Machine Learning Applications on FPGAs;2023 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS);2023-11-19

5. Koios 2.0: Open-Source Deep Learning Benchmarks for FPGA Architecture and CAD Research;IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems;2023-11