BurstZ+: Eliminating The Communication Bottleneck of Scientific Computing Accelerators via Accelerated Compression-Reference-Cited by-同舟云学术

BurstZ+: Eliminating The Communication Bottleneck of Scientific Computing Accelerators via Accelerated Compression

Published:2022-01-31 Issue:2 Volume:15 Page:1-34
ISSN:1936-7406
Container-title:ACM Transactions on Reconfigurable Technology and Systems
language:en
Short-container-title:ACM Trans. Reconfigurable Technol. Syst.

Author:

Sun Gongjin¹,Kang Seongyoung²,Jun Sang-Woo¹

Affiliation:

1. Department of Computer Science, University of California, Irvine

2. Department of Computer Science, Kookmin University, Irvine

Abstract

We present BurstZ+, an accelerator platform that eliminates the communication bottleneck between PCIe-attached scientific computing accelerators and their host servers, via hardware-optimized compression. While accelerators such as GPUs and FPGAs provide enormous computing capabilities, their effectiveness quickly deteriorates once data is larger than its on-board memory capacity, and performance becomes limited by the communication bandwidth of moving data between the host memory and accelerator. Compression has not been very useful in solving this issue due to performance and efficiency issues of compressing floating point numbers, which scientific data often consists of. BurstZ+ is an FPGA-based prototype accelerator platform which addresses the bandwidth issue via a class of novel hardware-optimized floating point compression algorithm called ZFP-V. We demonstrate that BurstZ+ can completely remove the host-side communication bottleneck for accelerators, using multiple stencil kernels with a wide range of operational intensities. Evaluated against hand-optimized implementations of kernel accelerators of the same architecture, our single-pipeline BurstZ+ prototype outperforms an accelerator without compression by almost 4×, and even an accelerator with enough memory for the entire dataset by over 2×. Furthermore, the projected performance of BurstZ+ on a future, faster FPGA scales to almost 7× that of the same accelerator without compression, whose performance is still limited by the PCIe bandwidth.

Funder

NSF

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3476831

Reference72 articles.

1. LogCA

2. Accelerating Lattice Boltzmann Fluid Flow Simulations Using Graphics Processors

3. Evaluating lossy data compression on climate simulation data within a large ensemble

4. Matrix multiplication on heterogeneous platforms

5. Sebastian Breß, Max Heimel, Norbert Siegmund, Ladjel Bellatreche, and Gunter Saake. 2014. GPU-accelerated database systems: Survey and open challenges. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XV. Springer, 1–35.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A data compressor for FPGA-based state vector quantum simulators;14th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART'24));2024-06-19

2. Increasing FPGA Accelerators Memory Bandwidth With a Burst-Friendly Memory Layout;IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems;2023-05

3. A compression-based memory-efficient optimization for out-of-core GPU stencil computation;The Journal of Supercomputing;2023-02-20

4. ZHW: A Numerical CODEC for Big Data Scientific Computation;2022 International Conference on Field-Programmable Technology (ICFPT);2022-12-05