In-Depth Analysis on Microarchitectures of Modern Heterogeneous CPU-FPGA Platforms-Reference-Cited by-同舟云学术

In-Depth Analysis on Microarchitectures of Modern Heterogeneous CPU-FPGA Platforms

Published:2019-04-05 Issue:1 Volume:12 Page:1-20
ISSN:1936-7406
Container-title:ACM Transactions on Reconfigurable Technology and Systems
language:en
Short-container-title:ACM Trans. Reconfigurable Technol. Syst.

Author:

Choi Young-Kyu¹,Cong Jason¹,Fang Zhenman¹,Hao Yuchen¹,Reinman Glenn¹,Wei Peng¹

Affiliation:

1. Center for Domain-Specific Computing, University of California, Los Angeles, CA, USA

Abstract

Conventional homogeneous multicore processors are not able to provide the continued performance and energy improvement that we have expected from past endeavors. Heterogeneous architectures that feature specialized hardware accelerators are widely considered a promising paradigm for resolving this issue. Among different heterogeneous devices, FPGAs that can be reconfigured to accelerate a broad class of applications with orders-of-magnitude performance/watt gains, are attracting increased attention from both academia and industry. As a consequence, a variety of CPU-FPGA acceleration platforms with diversified microarchitectural features have been supplied by industry vendors. Such diversity, however, poses a serious challenge to application developers in selecting the appropriate platform for a specific application or application domain. This article aims to address this challenge by determining which microarchitectural characteristics affect performance, and in what ways. Specifically, we conduct a quantitative comparison and an in-depth analysis on five state-of-the-art CPU-FPGA acceleration platforms: (1) the Alpha Data board and (2) the Amazon F1 instance that represent the traditional PCIe-based platform with private device memory; (3) the IBM CAPI that represents the PCIe-based system with coherent shared memory; (4) the first generation of the Intel Xeon+FPGA Accelerator Platform that represents the QPI-based system with coherent shared memory; and (5) the second generation of the Intel Xeon+FPGA Accelerator Platform that represents a hybrid PCIe-based (non-coherent) and QPI-based (coherent) system with shared memory. Based on the analysis of their CPU-FPGA communication latency and bandwidth characteristics, we provide a series of insights for both application developers and platform designers. Furthermore, we conduct two case studies to demonstrate how these insights can be leveraged to optimize accelerator designs. The microbenchmarks used for evaluation have been released for public use.

Funder

NSF/Intel InTrans

NSF/Intel CAPA

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3294054

Reference31 articles.

1. Jeff Burt. 2016. Intel to Start Shipping Xeons with FPGAs in Early 2016. Retrieved from http://www.eweek.com/servers/intel-to-start-shipping-xeons-with-fpgas-in-early-2016.html. Jeff Burt. 2016. Intel to Start Shipping Xeons with FPGAs in Early 2016. Retrieved from http://www.eweek.com/servers/intel-to-start-shipping-xeons-with-fpgas-in-early-2016.html.

2. Amazon. 2017. Amazon EC2 F1 Instance. Retrieved from https://aws.amazon.com/ec2/instance-types/f1/. Amazon. 2017. Amazon EC2 F1 Instance. Retrieved from https://aws.amazon.com/ec2/instance-types/f1/.

3. Xilinx. 2017. SDAccel Development Environment. Retrieved from http://www.xilinx.com/products/design-tools/software-zone/sdaccel.html. Xilinx. 2017. SDAccel Development Environment. Retrieved from http://www.xilinx.com/products/design-tools/software-zone/sdaccel.html.

4. CCIX. 2018. Cache Coherent Interconnect for Accelerators. Retrieved from https://www.ccixconsortium.com/. CCIX. 2018. Cache Coherent Interconnect for Accelerators. Retrieved from https://www.ccixconsortium.com/.

5. Instruction Set Innovations for the Convey HC-1 Computer

Cited by 29 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Unveiling the Advantages of Full Coherency Architecture for FPSoC Systems;IEEE Transactions on Very Large Scale Integration (VLSI) Systems;2024-08

2. HitGNN: High-Throughput GNN Training Framework on CPU+Multi-FPGA Heterogeneous Platform;IEEE Transactions on Parallel and Distributed Systems;2024-05

3. FlexForge: Efficient Reconfigurable Cloud Acceleration via Peripheral Resource Disaggregation;2024 Design, Automation & Test in Europe Conference & Exhibition (DATE);2024-03-25

4. FYalSAT: High-Throughput Stochastic Local Search K-SAT Solver on FPGA;IEEE Access;2024

5. Analysis and efficient implementation of IEEE-754 decimal floating point adders/subtractors in FPGAs for DPD and BID encoding;The Journal of Supercomputing;2023-12-06