LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics

Author:

Que Zhiqiang1ORCID,Fan Hongxiang1ORCID,Loo Marcus1ORCID,Li He2ORCID,Blott Michaela3ORCID,Pierini Maurizio4ORCID,Tapper Alexander5ORCID,Luk Wayne1ORCID

Affiliation:

1. Department of Computing, Imperial College London, London, UK

2. School of Electronic and Engineering, Southeast University, Nanjing, China

3. AMD Adaptive and Embedded Computing Group (AECG) Labs, Dublin, Ireland

4. European Organization for Nuclear Research (CERN), Geneva, Switzerland

5. Department of Physics, Imperial College London, London, UK

Abstract

This work presents a novel reconfigurable architecture for Low Latency Graph Neural Network (LL-GNN) designs for particle detectors, delivering unprecedented low latency performance. Incorporating FPGA-based GNNs into particle detectors presents a unique challenge since it requires sub-microsecond latency to deploy the networks for online event selection with a data rate of hundreds of terabytes per second in the Level-1 triggers at the CERN Large Hadron Collider experiments. This article proposes a novel outer-product based matrix multiplication approach, which is enhanced by exploiting the structured adjacency matrix and a column-major data layout. In addition, we propose a custom code transformation for the matrix multiplication operations, which leverages the structured sparsity patterns and binary features of adjacency matrices to reduce latency and improve hardware efficiency. Moreover, a fusion step is introduced to further reduce the end-to-end design latency by eliminating unnecessary boundaries. Furthermore, a GNN-specific algorithm-hardware co-design approach is presented which not only finds a design with a much better latency but also finds a high accuracy design under given latency constraints. To facilitate this, a customizable template for this low latency GNN hardware architecture has been designed and open-sourced, which enables the generation of low-latency FPGA designs with efficient resource utilization using a high-level synthesis tool. Evaluation results show that our FPGA implementation is up to 9.0 times faster and achieves up to 13.1 times higher power efficiency than a GPU implementation. Compared to the previous FPGA implementations, this work achieves 6.51 to 16.7 times lower latency. Moreover, the latency of our FPGA design is sufficiently low to enable deployment of GNNs in a sub-microsecond, real-time collider trigger system, enabling it to benefit from improved accuracy. The proposed LL-GNN design advances the next generation of trigger systems by enabling sophisticated algorithms to process experimental data efficiently.

Funder

United Kingdom EPSRC

CERN, AMD and SRC

Publisher

Association for Computing Machinery (ACM)

Reference58 articles.

1. Xilinx. 2020. Xilinx AI Engines and Their Applications [White Paper WP506(v1.1)] July 10 2020.

2. Stefan Abi-Karam Yuqi He Rishov Sarkar Lakshmi Sathidevi Zihang Qiao and Cong Hao. 2022. GenGNN: A generic FPGA framework for graph neural network acceleration. arXiv:2201.08475. Retrieved from https://arxiv.org/abs/2201.08475

3. Peter Battaglia Razvan Pascanu Matthew Lai and Danilo Jimenez Rezende. 2016. Interaction networks for learning about objects relations and physics. Advances in Neural Information Processing Systems Vol. 29.

4. Maciej Besta and Torsten Hoefler. 2022. Parallel and distributed graph neural networks: An in-depth concurrency analysis. arXiv:2205.09702. Retrieved from https://arxiv.org/abs/2205.09702

5. FINN- R

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Opportunities and challenges of graph neural networks in electrical engineering;Nature Reviews Electrical Engineering;2024-08-05

2. Ultrafast jet classification at the HL-LHC;Machine Learning: Science and Technology;2024-07-18

3. Low Latency Variational Autoencoder on FPGAs;IEEE Journal on Emerging and Selected Topics in Circuits and Systems;2024-06

4. PARAG: PIM Architecture for Real-Time Acceleration of GCNs;2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC);2023-12-18

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3