Energy-Performance Considerations for Data Offloading to FPGA-Based Accelerators Over PCIe

Author:

Mbakoyiannis Dimitrios1,Tomoutzoglou Othon1,Kornaros George1ORCID

Affiliation:

1. Technological Educational Institute of Crete, Crete, Greece

Abstract

Modern data centers increasingly employ FPGA-based heterogeneous acceleration platforms as a result of their great potential for continued performance and energy efficiency. Today, FPGAs provide more hardware parallelism than is possible with GPUs or CPUs, whereas C-like programming environments facilitate shorter development time, even close to software cycles. In this work, we address limitations and overheads in access and transfer of data to accelerators over common CPU-accelerator interconnects such as PCIe. We present three different FPGA accelerator dispatching methods for streaming applications (e.g., multimedia, vision computing). The first uses zero-copy data transfers and on-chip scratchpad memory (SPM) for energy efficiency, and the second uses also zero-copy but shared copy engines among different accelerator instances and local external memory. The third uses the processor’s memory management unit to acquire the physical address of user pages and uses scatter-gather data transfers with SPM. Even though all techniques exhibit advantages in terms of scalability and relieve the processor from control overheads through using integrated schedulers, the first method presents the best energy-efficient acceleration in streaming applications.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Reference33 articles.

1. Brad Brech Juan Rubio and Michael Hollinger. 2014. Data Engine for NoSQL —IBM Power Systems Edition. White Paper. IBM. https://www-304.ibm.com/webapp/set2/sas/f/capi/CAPI_FlashWhitePaper.pdf. Brad Brech Juan Rubio and Michael Hollinger. 2014. Data Engine for NoSQL —IBM Power Systems Edition. White Paper. IBM. https://www-304.ibm.com/webapp/set2/sas/f/capi/CAPI_FlashWhitePaper.pdf.

2. Instruction Set Innovations for the Convey HC-1 Computer

3. A quantitative analysis on microarchitectures of modern CPU-FPGA platforms

4. An Analysis of Accelerator Coupling in Heterogeneous Architectures

5. ffLink

Cited by 10 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Flexible Updating of Internet of Things Computing Functions through Optimizing Dynamic Partial Reconfiguration;ACM Transactions on Embedded Computing Systems;2024-03-18

2. Fair Resource Allocation in Virtualized O-RAN Platforms;Proceedings of the ACM on Measurement and Analysis of Computing Systems;2024-02-16

3. Theoretical Validation and Hardware Implementation of Dynamic Adaptive Scheduling for Heterogeneous Systems on Chip;Journal of Low Power Electronics and Applications;2023-10-17

4. Virtualizing a Post-Moore’s Law Analog Mesh Processor: The Case of a Photonic PDE Accelerator;ACM Transactions on Embedded Computing Systems;2023-01-24

5. Portrait: A holistic computation and bandwidth balanced performance evaluation model for heterogeneous systems;Sustainable Computing: Informatics and Systems;2022-09

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3