Automatic Generation of Resource and Accuracy Configurable Processing Elements

Author:

León-Vega Luis G.1ORCID,Salazar-Villalobos Eduardo2ORCID,Rodriguez-Figueroa Alejandro2ORCID,Castro-Godínez Jorge2ORCID

Affiliation:

1. Universitá degli Studi di Trieste, Italy and Instituto Tecnológico de Costa Rica, Costa Rica

2. Instituto Tecnológico de Costa Rica, Costa Rica

Abstract

Low-power consumption and scarce computational resources limit the computation at the edge. Besides, the approximate computing paradigm reports promising techniques for designing accelerators to deal with inherent limitations of the edge, and high-level synthesis with C++ opens the opportunity to use meta-programming for specialisable generic design. This work proposes a framework for automatically generating synthesis-time configurable processing elements (PEs) for matrix multiplication-addition (GEMMA) and convolution. To evaluate our work, we perform a design exploration after varying data bit-width, operand sizes, and kernel sizes. Our analyses include resource consumption scaling, clocks-to-solution, design efficiency, and error distribution, presenting a comprehensive view of how the parameters affect the properties of our generic implementations. The GEMMA presented a trade-off between granularity vs efficiency , where large PEs with short data widths are favoured by the design efficiency, achieving, theoretically, up to 75 GMAC/s on a Xilinx XC7Z020 @ 100 MHz with an efficiency of 27%. For design efficiency, we propose a figure of merit to evaluate operations per second and resource utilisation with respect to the maximum achievable by the FPGA. Regarding the convolution PEs, we implemented two algorithms: a window-based spatial convolution and Winograd. The former is the best in terms of performance with 150 GMAC/s, reaching up to 47% of efficiency. Winograd also outperformed numerically using a 3× 3 kernel filter, presenting a mean error of 11.01% in 4-bits operands with a PSNR=16.28 dB, compared to the spatial convolution with 38.2% of mean error and PSNR=5.89 dB. Finally, we discuss how the error is mostly dependent on the PE’s parameters. In the GEMMA, the error depends on the matrix size, causing limitations in the PE scaling but still applicable to accelerators. The PEs developed during this research will lead to further granular approximate accelerator research.

Funder

Programmi Operativi Nazionali

Ministero dell’ Universitáe della Ricerca

eXact Lab S.R.L

Instituto Tecnológico de Costa Rica

Master’s scholarship programme from RidgeRun Embedded Solutions LLC

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Improving Netlist Transformation-Based Approximate Logic Synthesis Through Resynthesis;IEEE Embedded Systems Letters;2024-09

2. A User-Friendly Ecosystem for AI FPGA-Based Accelerators;2024 IEEE International Conference on Omni-layer Intelligent Systems (COINS);2024-07-29

3. Acceleration of Fully Connected Layers on FPGA using the Strassen Matrix Multiplication;2023 IEEE 5th International Conference on BioInspired Processing (BIP);2023-11-28

4. Generic Accuracy Configurable Matrix Multiplication-Addition Accelerator using HLS;2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W);2023-06

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3