DyRecMul: Fast and Low-Cost Approximate Multiplier for FPGAs using Dynamic Reconfiguration-Reference-Cited by-同舟云学术

DyRecMul: Fast and Low-Cost Approximate Multiplier for FPGAs using Dynamic Reconfiguration

Published:2024-05 Issue: Volume: Page:
ISSN:1936-7406
Container-title:ACM Transactions on Reconfigurable Technology and Systems
language:en
Short-container-title:ACM Trans. Reconfigurable Technol. Syst.

Author:

Vakili Shervin¹^ORCID,Vaziri Mobin²^ORCID,Zarei Amirhossein¹^ORCID,Langlois J.M. Pierre²^ORCID

Affiliation:

1. Institut national de la recherche scientifique, Energie Materiaux Telecommunications Centre, Montreal, Canada

2. Polytechnique Montréal, Montreal, Canada

Abstract

Multipliers are widely-used arithmetic operators in digital signal processing and machine learning circuits. Due to their relatively high complexity, they can have high latency and be a significant source of power consumption. One strategy to alleviate these limitations is to use approximate computing. This paper thus introduces an original FPGA-based approximate multiplier specifically optimized for machine learning computations. It utilizes dynamically reconfigurable lookup table (LUT) primitives in AMD-Xilinx technology to realize the core part of the computations. The paper provides an in-depth analysis of the hardware architecture, implementation outcomes, and accuracy evaluations of the multiplier proposed in INT8 precision. The paper also facilitates the generalization of the proposed approximate multiplier idea to other datatypes, providing analysis and estimations for hardware cost and accuracy as a function of multiplier parameters. Implementation results on an AMD-Xilinx Kintex Ultrascale+ FPGA demonstrate remarkable savings of 64% and 67% in LUT utilization for signed multiplication and multiply-and-accumulation configurations, respectively when compared to the standard Xilinx multiplier core. Accuracy measurements on four popular deep learning (DL) benchmarks indicate a minimal average accuracy decrease of less than 0.29% during post-training deployment, with the maximum reduction staying less than 0.33%. The source code of this work is available on GitHub.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3663480

Reference48 articles.

1. Energy and area efficient imprecise compressors for approximate multiplication at nanoscale

2. Dual-Quality 4:2 Compressors for Utilizing in Dynamic Accuracy Configurable Multipliers

3. A Hardware-Efficient Logarithmic Multiplier with Improved Accuracy

4. An Improved Logarithmic Multiplier for Energy-Efficient Neural Computing

5. Mohammad Saeed Ansari, Honglan Jiang, Bruce F Cockburn, and Jie Han. 2018. Low-power approximate multipliers using encoded partial products and approximate compressors. IEEE journal on emerging and selected topics in circuits and systems 8, 3(2018), 404–416.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Unified Hardware Design for Multiplication, Division, and Square Roots Using Binary Logarithms;Symmetry;2024-09-02