Graph-OPU: A highly flexible FPGA-Based Overlay Processor for Graph Neural Networks-Reference-Cited by-同舟云学术

Graph-OPU: A highly flexible FPGA-Based Overlay Processor for Graph Neural Networks

Published:2024-09-02 Issue: Volume: Page:
ISSN:1936-7406
Container-title:ACM Transactions on Reconfigurable Technology and Systems
language:en
Short-container-title:ACM Trans. Reconfigurable Technol. Syst.

Author:

Tang Enhao¹^ORCID,Li Shun¹^ORCID,Chen Ruiqi¹^ORCID,Zhou Hao¹^ORCID,Ma Yuhanxiao²^ORCID,Zhang Haoyang¹^ORCID,Yu Jun¹^ORCID,Wang Kun¹^ORCID

Affiliation:

1. School of Microelectronics, Fudan University, China

2. New York University, USA

Abstract

Field-programmable gate arrays (FPGAs) are an ideal candidate for accelerating graph neural networks (GNNs). However, the FPGA redeployment process is time-consuming when updating or switching between diverse GNN models across different applications. Existing GNN processors eliminate the need for FPGA redeployment when switching between different GNN models. However, adapting matrix multiplication types by switching processing units decreases hardware utilization. In addition, the bandwidth of DDR limits further improvements in hardware performance. This paper proposes a highly flexible FPGA-based overlay processor for GNN accelerations. Graph-OPU provides excellent flexibility and programmability for users, as the executable code of GNN models is automatically compiled and reloaded without requiring FPGA redeployment. First, we customize the compiler and instruction sets for the inference process of different GNN models. Second, we customize the datapath and optimize the data format in the microarchitecture to fully leverage the advantages of high bandwidth memory (HBM). Third, we design a unified matrix multiplication to handle both sparse-dense matrix multiplication (SpMM) and general matrix multiplication (GEMM), enhancing Graph-OPU performance. During Graph-OPU execution, the computational units are shared between SpMM and GEMM instead of being switched, which improves the hardware utilization. Finally, we implement a hardware prototype on the Xilinx Alveo U50 and test the mainstream GNN models using various datasets. Experimental results show that Graph-OPU achieves up to 1654

\(\times\)

and 63

\(\times\)

speedup, as well as up to 5305

\(\times\)

and 422

\(\times\)

energy efficiency boosts, compared to implementations on CPU and GPU, respectively. Graph-OPU outperforms state-of-the-art (SOTA) end-to-end overlay accelerators for GNN, reducing latency by an average of 1.36

\(\times\)

and improving energy efficiency by 1.41

\(\times\)

on average. Moreover, Graph-OPU exhibits an average 1.45

\(\times\)

speed improvement in end-to-end latency over the SOTA GNN processor. Graph-OPU represents an in-depth study of an FPGA-based overlay processor for GNNs, offering high flexibility, speedup, and energy efficiency.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3691636

Reference41 articles.

1. Computing graph neural networks: A survey from algorithms to accelerators;Abadal Sergi;ACM Computing Surveys (CSUR),2021

2. Geometric deep learning on molecular representations

3. Yueyin Bai, Hao Zhou, Keqing Zhao, Jianli Chen, Jun Yu, and Kun Wang. 2023. Transformer-OPU: An FPGA-based Overlay Processor for Transformer Networks. In 2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 221–221. https://doi.org/10.1109/FCCM57271.2023.00049

4. Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann Lecun. 2014. Spectral networks and locally connected networks on graphs. In International Conference on Learning Representations (ICLR2014), CBLS, April 2014.

5. Ruiqi Chen, Haoyang Zhang, Shun Li, Enhao Tang, Jun Yu, and Kun Wang. 2023. Graph-OPU: A Highly Integrated FPGA-Based Overlay Processor for Graph Neural Networks. In 2023 33rd International Conference on Field-Programmable Logic and Applications (FPL). 228–234. https://doi.org/10.1109/FPL60245.2023.00039