In-Depth Analysis on Microarchitectures of Modern Heterogeneous CPU-FPGA Platforms

Author:

Choi Young-Kyu1,Cong Jason1,Fang Zhenman1,Hao Yuchen1,Reinman Glenn1,Wei Peng1

Affiliation:

1. Center for Domain-Specific Computing, University of California, Los Angeles, CA, USA

Abstract

Conventional homogeneous multicore processors are not able to provide the continued performance and energy improvement that we have expected from past endeavors. Heterogeneous architectures that feature specialized hardware accelerators are widely considered a promising paradigm for resolving this issue. Among different heterogeneous devices, FPGAs that can be reconfigured to accelerate a broad class of applications with orders-of-magnitude performance/watt gains, are attracting increased attention from both academia and industry. As a consequence, a variety of CPU-FPGA acceleration platforms with diversified microarchitectural features have been supplied by industry vendors. Such diversity, however, poses a serious challenge to application developers in selecting the appropriate platform for a specific application or application domain. This article aims to address this challenge by determining which microarchitectural characteristics affect performance, and in what ways. Specifically, we conduct a quantitative comparison and an in-depth analysis on five state-of-the-art CPU-FPGA acceleration platforms: (1) the Alpha Data board and (2) the Amazon F1 instance that represent the traditional PCIe-based platform with private device memory; (3) the IBM CAPI that represents the PCIe-based system with coherent shared memory; (4) the first generation of the Intel Xeon+FPGA Accelerator Platform that represents the QPI-based system with coherent shared memory; and (5) the second generation of the Intel Xeon+FPGA Accelerator Platform that represents a hybrid PCIe-based (non-coherent) and QPI-based (coherent) system with shared memory. Based on the analysis of their CPU-FPGA communication latency and bandwidth characteristics, we provide a series of insights for both application developers and platform designers. Furthermore, we conduct two case studies to demonstrate how these insights can be leveraged to optimize accelerator designs. The microbenchmarks used for evaluation have been released for public use.

Funder

NSF/Intel InTrans

NSF/Intel CAPA

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference31 articles.

1. Jeff Burt. 2016. Intel to Start Shipping Xeons with FPGAs in Early 2016. Retrieved from http://www.eweek.com/servers/intel-to-start-shipping-xeons-with-fpgas-in-early-2016.html. Jeff Burt. 2016. Intel to Start Shipping Xeons with FPGAs in Early 2016. Retrieved from http://www.eweek.com/servers/intel-to-start-shipping-xeons-with-fpgas-in-early-2016.html.

2. Amazon. 2017. Amazon EC2 F1 Instance. Retrieved from https://aws.amazon.com/ec2/instance-types/f1/. Amazon. 2017. Amazon EC2 F1 Instance. Retrieved from https://aws.amazon.com/ec2/instance-types/f1/.

3. Xilinx. 2017. SDAccel Development Environment. Retrieved from http://www.xilinx.com/products/design-tools/software-zone/sdaccel.html. Xilinx. 2017. SDAccel Development Environment. Retrieved from http://www.xilinx.com/products/design-tools/software-zone/sdaccel.html.

4. CCIX. 2018. Cache Coherent Interconnect for Accelerators. Retrieved from https://www.ccixconsortium.com/. CCIX. 2018. Cache Coherent Interconnect for Accelerators. Retrieved from https://www.ccixconsortium.com/.

5. Instruction Set Innovations for the Convey HC-1 Computer

Cited by 25 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Analysis and efficient implementation of IEEE-754 decimal floating point adders/subtractors in FPGAs for DPD and BID encoding;The Journal of Supercomputing;2023-12-06

2. Theoretical Validation and Hardware Implementation of Dynamic Adaptive Scheduling for Heterogeneous Systems on Chip;Journal of Low Power Electronics and Applications;2023-10-17

3. Heterogeneous Reconfigurable Accelerators: Trends and Perspectives;2023 60th ACM/IEEE Design Automation Conference (DAC);2023-07-09

4. CPU-free Computing: A Vision with a Blueprint;Proceedings of the 19th Workshop on Hot Topics in Operating Systems;2023-06-22

5. Rambda: RDMA-driven Acceleration Framework for Memory-intensive µs-scale Datacenter Applications;2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2023-02

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3