Automatic Generation of Resource and Accuracy Configurable Processing Elements-Reference-Cited by-同舟云学术

Automatic Generation of Resource and Accuracy Configurable Processing Elements

Published:2023-07-24 Issue:4 Volume:22 Page:1-27
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

León-Vega Luis G.¹^ORCID,Salazar-Villalobos Eduardo²^ORCID,Rodriguez-Figueroa Alejandro²^ORCID,Castro-Godínez Jorge²^ORCID

Affiliation:

1. Universitá degli Studi di Trieste, Italy and Instituto Tecnológico de Costa Rica, Costa Rica

2. Instituto Tecnológico de Costa Rica, Costa Rica

Abstract

Low-power consumption and scarce computational resources limit the computation at the edge. Besides, the approximate computing paradigm reports promising techniques for designing accelerators to deal with inherent limitations of the edge, and high-level synthesis with C++ opens the opportunity to use meta-programming for specialisable generic design. This work proposes a framework for automatically generating synthesis-time configurable processing elements (PEs) for matrix multiplication-addition (GEMMA) and convolution. To evaluate our work, we perform a design exploration after varying data bit-width, operand sizes, and kernel sizes. Our analyses include resource consumption scaling, clocks-to-solution, design efficiency, and error distribution, presenting a comprehensive view of how the parameters affect the properties of our generic implementations. The GEMMA presented a trade-off between granularity vs efficiency , where large PEs with short data widths are favoured by the design efficiency, achieving, theoretically, up to 75 GMAC/s on a Xilinx XC7Z020 @ 100 MHz with an efficiency of 27%. For design efficiency, we propose a figure of merit to evaluate operations per second and resource utilisation with respect to the maximum achievable by the FPGA. Regarding the convolution PEs, we implemented two algorithms: a window-based spatial convolution and Winograd. The former is the best in terms of performance with 150 GMAC/s, reaching up to 47% of efficiency. Winograd also outperformed numerically using a 3× 3 kernel filter, presenting a mean error of 11.01% in 4-bits operands with a PSNR=16.28 dB, compared to the spatial convolution with 38.2% of mean error and PSNR=5.89 dB. Finally, we discuss how the error is mostly dependent on the PE’s parameters. In the GEMMA, the error depends on the matrix size, causing limitations in the PE scaling but still applicable to accelerators. The PEs developed during this research will lead to further granular approximate accelerator research.

Funder

Programmi Operativi Nazionali

Ministero dell’ Universitáe della Ricerca

eXact Lab S.R.L

Instituto Tecnológico de Costa Rica

Master’s scholarship programme from RidgeRun Embedded Solutions LLC

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3594540