LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics-Reference-Cited by-同舟云学术

LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics

Published:2024-03-18 Issue:2 Volume:23 Page:1-28
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Que Zhiqiang¹^ORCID,Fan Hongxiang¹^ORCID,Loo Marcus¹^ORCID,Li He²^ORCID,Blott Michaela³^ORCID,Pierini Maurizio⁴^ORCID,Tapper Alexander⁵^ORCID,Luk Wayne¹^ORCID

Affiliation:

1. Department of Computing, Imperial College London, London, UK

2. School of Electronic and Engineering, Southeast University, Nanjing, China

3. AMD Adaptive and Embedded Computing Group (AECG) Labs, Dublin, Ireland

4. European Organization for Nuclear Research (CERN), Geneva, Switzerland

5. Department of Physics, Imperial College London, London, UK

Abstract

This work presents a novel reconfigurable architecture for Low Latency Graph Neural Network (LL-GNN) designs for particle detectors, delivering unprecedented low latency performance. Incorporating FPGA-based GNNs into particle detectors presents a unique challenge since it requires sub-microsecond latency to deploy the networks for online event selection with a data rate of hundreds of terabytes per second in the Level-1 triggers at the CERN Large Hadron Collider experiments. This article proposes a novel outer-product based matrix multiplication approach, which is enhanced by exploiting the structured adjacency matrix and a column-major data layout. In addition, we propose a custom code transformation for the matrix multiplication operations, which leverages the structured sparsity patterns and binary features of adjacency matrices to reduce latency and improve hardware efficiency. Moreover, a fusion step is introduced to further reduce the end-to-end design latency by eliminating unnecessary boundaries. Furthermore, a GNN-specific algorithm-hardware co-design approach is presented which not only finds a design with a much better latency but also finds a high accuracy design under given latency constraints. To facilitate this, a customizable template for this low latency GNN hardware architecture has been designed and open-sourced, which enables the generation of low-latency FPGA designs with efficient resource utilization using a high-level synthesis tool. Evaluation results show that our FPGA implementation is up to 9.0 times faster and achieves up to 13.1 times higher power efficiency than a GPU implementation. Compared to the previous FPGA implementations, this work achieves 6.51 to 16.7 times lower latency. Moreover, the latency of our FPGA design is sufficiently low to enable deployment of GNNs in a sub-microsecond, real-time collider trigger system, enabling it to benefit from improved accuracy. The proposed LL-GNN design advances the next generation of trigger systems by enabling sophisticated algorithms to process experimental data efficiently.

Funder

United Kingdom EPSRC

CERN, AMD and SRC

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3640464

Reference58 articles.

1. Xilinx. 2020. Xilinx AI Engines and Their Applications [White Paper WP506(v1.1)] July 10 2020.

2. Stefan Abi-Karam Yuqi He Rishov Sarkar Lakshmi Sathidevi Zihang Qiao and Cong Hao. 2022. GenGNN: A generic FPGA framework for graph neural network acceleration. arXiv:2201.08475. Retrieved from https://arxiv.org/abs/2201.08475

3. Peter Battaglia Razvan Pascanu Matthew Lai and Danilo Jimenez Rezende. 2016. Interaction networks for learning about objects relations and physics. Advances in Neural Information Processing Systems Vol. 29.

4. Maciej Besta and Torsten Hoefler. 2022. Parallel and distributed graph neural networks: An in-depth concurrency analysis. arXiv:2205.09702. Retrieved from https://arxiv.org/abs/2205.09702

5. FINN- R

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Opportunities and challenges of graph neural networks in electrical engineering;Nature Reviews Electrical Engineering;2024-08-05

2. Ultrafast jet classification at the HL-LHC;Machine Learning: Science and Technology;2024-07-18

3. Low Latency Variational Autoencoder on FPGAs;IEEE Journal on Emerging and Selected Topics in Circuits and Systems;2024-06

4. PARAG: PIM Architecture for Real-Time Acceleration of GCNs;2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC);2023-12-18