TAPA: A Scalable Task-parallel Dataflow Programming Framework for Modern FPGAs with Co-optimization of HLS and Physical Design-Reference-Cited by-同舟云学术

TAPA: A Scalable Task-parallel Dataflow Programming Framework for Modern FPGAs with Co-optimization of HLS and Physical Design

Published:2023-12-05 Issue:4 Volume:16 Page:1-31
ISSN:1936-7406
Container-title:ACM Transactions on Reconfigurable Technology and Systems
language:en
Short-container-title:ACM Trans. Reconfigurable Technol. Syst.

Author:

Guo Licheng¹^ORCID,Chi Yuze¹^ORCID,Lau Jason¹^ORCID,Song Linghao¹^ORCID,Tian Xingyu²^ORCID,Khatti Moazin²^ORCID,Qiao Weikang¹^ORCID,Wang Jie¹^ORCID,Ustun Ecenur³^ORCID,Fang Zhenman²^ORCID,Zhang Zhiru³^ORCID,Cong Jason¹^ORCID

Affiliation:

1. University of California Los Angeles, USA

2. Simon Fraser University, Canada

3. Cornell University, USA

Abstract

In this article, we propose TAPA, an end-to-end framework that compiles a C++ task-parallel dataflow program into a high-frequency FPGA accelerator. Compared to existing solutions, TAPA has two major advantages. First, TAPA provides a set of convenient APIs that allows users to easily express flexible and complex inter-task communication structures. Second, TAPA adopts a coarse-grained floorplanning step during HLS compilation for accurate pipelining of potential critical paths. In addition, TAPA implements several optimization techniques specifically tailored for modern HBM-based FPGAs. In our experiments with a total of 43 designs, we improve the average frequency from 147 MHz to 297 MHz (a 102% improvement) with no loss of throughput and a negligible change in resource utilization. Notably, in 16 experiments, we make the originally unroutable designs achieve 274 MHz, on average. The framework is available at https://github.com/UCLA-VAST/tapa and the core floorplan module is available at https://github.com/UCLA-VAST/AutoBridge

Funder

Intel/NSF CAPA program, the NSF NeuroNex Award

NIH

NSERC Discovery

CFI John R. Evans Leaders Fund, NSF projects

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3609335

Reference99 articles.

1. Latency Insensitive Design Styles for FPGAs

2. Handbook of Algorithms for Physical Design Automation

3. Fast Unified Floorplan Topology Generation and Sizing on Heterogeneous FPGAs

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. PASTA: Programming and Automation Support for Scalable Task-Parallel HLS Programs on Modern Multi-Die FPGAs;ACM Transactions on Reconfigurable Technology and Systems;2024-08-05

2. HiHiSpMV: Sparse Matrix Vector Multiplication with Hierarchical Row Reductions on FPGAs with High Bandwidth Memory;2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM);2024-05-05

3. HiSpMV: Hybrid Row Distribution and Vector Buffering for Imbalanced SpMV Acceleration on FPGAs;Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays;2024-04

4. Scheduling and Physical Design;Proceedings of the 2024 International Symposium on Physical Design;2024-03-12