Affiliation:
1. Universitá degli Studi di Trieste, Italy and Instituto Tecnológico de Costa Rica, Costa Rica
2. Instituto Tecnológico de Costa Rica, Costa Rica
Abstract
Low-power consumption and scarce computational resources limit the computation at the edge. Besides, the approximate computing paradigm reports promising techniques for designing accelerators to deal with inherent limitations of the edge, and high-level synthesis with C++ opens the opportunity to use meta-programming for specialisable generic design. This work proposes a framework for automatically generating synthesis-time configurable
processing elements (PEs)
for matrix multiplication-addition (GEMMA) and convolution. To evaluate our work, we perform a design exploration after varying data bit-width, operand sizes, and kernel sizes. Our analyses include resource consumption scaling, clocks-to-solution, design efficiency, and error distribution, presenting a comprehensive view of how the parameters affect the properties of our generic implementations. The GEMMA presented a trade-off between
granularity vs efficiency
, where large PEs with short data widths are favoured by the design efficiency, achieving, theoretically, up to 75 GMAC/s on a Xilinx XC7Z020 @ 100 MHz with an efficiency of 27%. For design efficiency, we propose a figure of merit to evaluate operations per second and resource utilisation with respect to the maximum achievable by the FPGA. Regarding the convolution PEs, we implemented two algorithms: a window-based spatial convolution and Winograd. The former is the best in terms of performance with 150 GMAC/s, reaching up to 47% of efficiency. Winograd also outperformed numerically using a 3× 3 kernel filter, presenting a mean error of 11.01% in 4-bits operands with a PSNR=16.28 dB, compared to the spatial convolution with 38.2% of mean error and PSNR=5.89 dB. Finally, we discuss how the error is mostly dependent on the PE’s parameters. In the GEMMA, the error depends on the matrix size, causing limitations in the PE scaling but still applicable to accelerators. The PEs developed during this research will lead to further granular approximate accelerator research.
Funder
Programmi Operativi Nazionali
Ministero dell’ Universitáe della Ricerca
eXact Lab S.R.L
Instituto Tecnológico de Costa Rica
Master’s scholarship programme from RidgeRun Embedded Solutions LLC
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Software
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Improving Netlist Transformation-Based Approximate Logic Synthesis Through Resynthesis;IEEE Embedded Systems Letters;2024-09
2. A User-Friendly Ecosystem for AI FPGA-Based Accelerators;2024 IEEE International Conference on Omni-layer Intelligent Systems (COINS);2024-07-29
3. Acceleration of Fully Connected Layers on FPGA using the Strassen Matrix Multiplication;2023 IEEE 5th International Conference on BioInspired Processing (BIP);2023-11-28
4. Generic Accuracy Configurable Matrix Multiplication-Addition Accelerator using HLS;2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W);2023-06