Affiliation:
1. University of California Los Angeles, USA
2. Simon Fraser University, Canada
3. Cornell University, USA
Abstract
In this article, we propose TAPA, an end-to-end framework that compiles a C++ task-parallel dataflow program into a high-frequency FPGA accelerator. Compared to existing solutions, TAPA has two major advantages. First, TAPA provides a set of convenient APIs that allows users to easily express flexible and complex inter-task communication structures. Second, TAPA adopts a coarse-grained floorplanning step during HLS compilation for accurate pipelining of potential critical paths. In addition, TAPA implements several optimization techniques specifically tailored for modern HBM-based FPGAs. In our experiments with a total of 43 designs, we improve the average frequency from 147 MHz to 297 MHz (a 102% improvement) with no loss of throughput and a negligible change in resource utilization. Notably, in 16 experiments, we make the originally unroutable designs achieve 274 MHz, on average. The framework is available at
https://github.com/UCLA-VAST/tapa
and the core floorplan module is available at
https://github.com/UCLA-VAST/AutoBridge
Funder
Intel/NSF CAPA program, the NSF NeuroNex Award
NIH
NSERC Discovery
CFI John R. Evans Leaders Fund, NSF projects
Publisher
Association for Computing Machinery (ACM)