LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics

Author:

Que Zhiqiang1,Fan Hongxiang1,Loo Marcus1,Li He2,Blott Michaela3,Pierini Maurizio4,Tapper Alexander5,Luk Wayne1

Affiliation:

1. Department of Computing, Imperial College London, UK

2. School of Electronic and Engineering, Southeast University, China

3. AMD Adaptive and Embedded Computing Group (AECG) Labs, Ireland

4. European Organization for Nuclear Research (CERN), Switzerland

5. Department of Physics, Imperial College London, UK

Abstract

This work presents a novel reconfigurable architecture for Low Latency Graph Neural Network (LL-GNN) designs for particle detectors, delivering unprecedented low latency performance. Incorporating FPGA-based GNNs into particle detectors presents a unique challenge since it requires sub-microsecond latency to deploy the networks for online event selection with a data rate of hundreds of terabytes per second in the Level-1 triggers at the CERN Large Hadron Collider experiments. This paper proposes a novel outer-product based matrix multiplication approach, which is enhanced by exploiting the structured adjacency matrix and a column-major data layout. In addition, we propose a custom code transformation for the matrix multiplication operations, which leverages the structured sparsity patterns and binary features of adjacency matrices to reduce latency and improve hardware efficiency. Moreover, a fusion step is introduced to further reduce the end-to-end design latency by eliminating unnecessary boundaries. Furthermore, a GNN-specific algorithm-hardware co-design approach is presented which not only finds a design with a much better latency but also finds a high accuracy design under given latency constraints. To facilitate this, a customizable template for this low latency GNN hardware architecture has been designed and open-sourced, which enables the generation of low-latency FPGA designs with efficient resource utilization using a high-level synthesis tool. Evaluation results show that our FPGA implementation is up to 9.0 times faster and achieves up to 13.1 times higher power efficiency than a GPU implementation. Compared to the previous FPGA implementations, this work achieves 6.51 to 16.7 times lower latency. Moreover, the latency of our FPGA design is sufficiently low to enable deployment of GNNs in a sub-microsecond, real-time collider trigger system, enabling it to benefit from improved accuracy. The proposed LL-GNN design advances the next generation of trigger systems by enabling sophisticated algorithms to process experimental data efficiently.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Reference56 articles.

1. July 10 2020. Xilinx AI Engines and Their Applications. In WP506(v1.1). July 10 2020. Xilinx AI Engines and Their Applications. In WP506(v1.1).

2. Stefan Abi-Karam Yuqi He Rishov Sarkar Lakshmi Sathidevi Zihang Qiao and Cong Hao. 2022. GenGNN: A Generic FPGA Framework for Graph Neural Network Acceleration. arXiv preprint arXiv:2201.08475(2022). Stefan Abi-Karam Yuqi He Rishov Sarkar Lakshmi Sathidevi Zihang Qiao and Cong Hao. 2022. GenGNN: A Generic FPGA Framework for Graph Neural Network Acceleration. arXiv preprint arXiv:2201.08475(2022).

3. Peter Battaglia , Razvan Pascanu , Matthew Lai , Danilo Jimenez Rezende , et al . 2016 . Interaction networks for learning about objects, relations and physics. Advances in neural information processing systems 29 (2016). Peter Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, et al. 2016. Interaction networks for learning about objects, relations and physics. Advances in neural information processing systems 29 (2016).

4. Maciej Besta and Torsten Hoefler. 2022. Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency Analysis. arXiv preprint arXiv:2205.09702(2022). Maciej Besta and Torsten Hoefler. 2022. Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency Analysis. arXiv preprint arXiv:2205.09702(2022).

5. FINN- R

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3