Optimizing High-Throughput Inference on Graph Neural Networks at Shared Computing Facilities with the NVIDIA Triton Inference Server-Reference-Cited by-同舟云学术

Optimizing High-Throughput Inference on Graph Neural Networks at Shared Computing Facilities with the NVIDIA Triton Inference Server

Published:2024-07-18 Issue:1 Volume:8 Page:
ISSN:2510-2036
Container-title:Computing and Software for Big Science
language:en
Short-container-title:Comput Softw Big Sci

Author:

Savard Claire,Manganelli Nicholas,Holzman Burt,Gray Lindsey,Perloff Alexx,Pedro Kevin,Stenson Kevin,Ulmer Keith

Abstract

AbstractWith machine learning applications now spanning a variety of computational tasks, multi-user shared computing facilities are devoting a rapidly increasing proportion of their resources to such algorithms. Graph neural networks (GNNs), for example, have provided astounding improvements in extracting complex signatures from data and are now widely used in a variety of applications, such as particle jet classification in high energy physics (HEP). However, GNNs also come with an enormous computational penalty that requires the use of GPUs to maintain reasonable throughput. At shared computing facilities, such as those used by physicists at Fermi National Accelerator Laboratory (Fermilab), methodical resource allocation and high throughput at the many-user scale are key to ensuring that resources are being used as efficiently as possible. These facilities, however, primarily provide CPU-only nodes, which proves detrimental to time-to-insight and computational throughput for workflows that include machine learning inference. In this work, we describe how a shared computing facility can use the NVIDIA Triton Inference Server to optimize its resource allocation and computing structure, recovering high throughput while scaling out to multiple users by massively parallelizing their machine learning inference. To demonstrate the effectiveness of this system in a realistic multi-user environment, we use the Fermilab Elastic Analysis Facility augmented with the Triton Inference Server to provide scalable and high-throughput access to a HEP-specific GNN and report on the outcome.

Funder

U.S. Department of Energy

Fermilab

National Science Foundation

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s41781-024-00123-2.pdf

Reference32 articles.

1. Albertsson K, Altoe P, Anderson D et al (2018) Machine learning in high energy physics community white paper. J Phys Conf Ser 1085(2):022008. https://doi.org/10.1088/1742-6596/1085/2/022008

2. Guest D, Cranmer K, Whiteson D (2018) Deep learning and its application to LHC physics. Ann Rev Nucl Part Sci 68(1):161–181. https://doi.org/10.1146/annurev-nucl-101917-021019

3. Buber E, Diri B (2018) Performance analysis and CPU vs GPU comparison for deep learning. Int Conf Control Eng Inf Technol Proc. https://doi.org/10.1109/CEIT.2018.8751930

4. Wang Y, Wei G-Y, Brooks D (2019) Benchmarking TPU, GPU, and CPU platforms for deep learning. arXiv:1907.10701 [cs.LG]

5. Baker M, Fox GC, Yau HW (1995) Cluster computing review. Northeast Parallel Architecture Center. 33

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Portable Acceleration of CMS Computing Workflows with Coprocessors as a Service;Computing and Software for Big Science;2024-09-04