NeutronOrch: Rethinking Sample-Based GNN Training under CPU-GPU Heterogeneous Environments-Reference-Cited by-同舟云学术

NeutronOrch: Rethinking Sample-Based GNN Training under CPU-GPU Heterogeneous Environments

Published:2024-04 Issue:8 Volume:17 Page:1995-2008
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Ai Xin¹,Wang Qiange²,Cao Chunyu¹,Zhang Yanfeng¹,Chen Chaoyi¹,Yuan Hao¹,Gu Yu¹,Yu Ge¹

Affiliation:

1. Northeastern Univ., China

2. National University of Singapore, Singapore

Abstract

Graph Neural Networks (GNNs) have shown exceptional performance across a wide range of applications. Current frameworks leverage CPU-GPU heterogeneous environments for GNN model training, incorporating mini-batch and sampling techniques to mitigate GPU memory constraints. In such settings, sample-based GNN training can be divided into three phases: sampling, gathering, and training. Existing GNN systems deploy various task orchestration methods to execute each phase on either the CPU or GPU. However, through comprehensive experimentation and analysis, we observe that these task orchestration approaches do not optimally exploit the available heterogeneous resources, hindered by either inefficient CPU processing or GPU resource bottlenecks. In this paper, we propose NeutronOrch, a system for sample-based GNN training that ensures balanced utilization of the CPU and GPU. NeutronOrch decouples the training process by layer and pushes down the training task of the bottom layer to the CPU. This significantly reduces the computational load and memory footprint of GPU training. To avoid inefficient CPU processing, NeutronOrch only offloads the training of frequently accessed vertices to the CPU and lets GPU reuse their embeddings with bounded staleness. Furthermore, NeutronOrch provides a fine-grained pipeline design for the layer-based task orchestrating method. The experimental results show that compared with the state-of-the-art GNN systems, NeutronOrch can achieve up to 11.51× performance speedup.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.14778/3659437.3659453

Reference58 articles.

1. Lars Backstrom, Daniel P. Huttenlocher, Jon M. Kleinberg, and Xiangyang Lan. 2006. Group formation in large social networks: membership, growth, and evolution. In Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD'06, Philadelphia, PA, USA. 44--54.

2. Jasmijn Bastings, Ivan Titov, Wilker Aziz, Diego Marcheggiani, and Khalil Sima'an. 2017. Graph Convolutional Encoders for Syntax-aware Neural Machine Translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP'17, Copenhagen, Denmark. Association for Computational Linguistics, 1957--1967.

3. DSP

4. R-MAT: A Recursive Model for Graph Mining

5. Jie Chen, Tengfei Ma, and Cao Xiao. 2018. FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling. In 6th International Conference on Learning Representations, ICLR'18, Vancouver, BC, Canada. OpenReview.net.