Affiliation:
1. Global Supercomputing Corporation, Yorktown Heights, NY, USA
2. Eskisehir Technical University, Department of Electrical and Electronics Engineering, Iki Eylul Kampus, Eskisehir, Turkey
Abstract
We present a High Level Synthesis compiler that automatically obtains a multi-chip accelerator system from a single-threaded sequential C/C++ application. Invoking the multi-chip accelerator is functionally identical to invoking the single-threaded sequential code the multi-chip accelerator is compiled from. Therefore, software development for using the multi-chip accelerator hardware is simplified, but the multi-chip accelerator can exhibit extremely high parallelism. We have implemented, tested, and verified our push-button system design model on multiple field-programmable gate arrays (FPGAs) of the Amazon Web Services EC2 F1 instances platform, using, as an example, a sequential-natured DES key search application that does not have any DOALL loops and that tries each candidate key in order and stops as soon as a correct key is found. An 8- FPGA accelerator produced by our compiler achieves
44,600
times better performance than an x86 Xeon CPU executing the sequential single-threaded C program the accelerator was compiled from. New features of our compiler system include: an ability to parallelize outer loops with loop-carried control dependences, an ability to pipeline an outer loop without fully unrolling its inner loops, and fully automated deployment, execution and termination of multi-FPGA application-specific accelerators in the AWS cloud, without requiring any manual steps.
Publisher
Association for Computing Machinery (ACM)
Reference96 articles.
1. A high-performance and energy-efficient exhaustive key search approach via GPU on DES-like cryptosystems
2. Alibaba Cloud. 2021. Deep Dive into Alibaba Cloud F3 FPGA as a Service Instances. Retrieved from https://www.alibabacloud.com/blog/deep-dive-into-alibaba-cloud-f3-fpga-as-a-service-instances_594057. Accessed: 2021-04-19.
3. Alibaba Cloud. 2021. FPGA-accelerated compute optimized instance family. Retrieved from https://www.alibabacloud.com/help/doc-detail/108504.htm. Accessed: 2021-04-19.
4. Software pipelining