Distributed Graph Processing System and Processing-in-memory Architecture with Precise Loop-carried Dependency Guarantee-Reference-Cited by-同舟云学术

Distributed Graph Processing System and Processing-in-memory Architecture with Precise Loop-carried Dependency Guarantee

Published:2019-11-30 Issue:1-4 Volume:37 Page:1-37
ISSN:0734-2071
Container-title:ACM Transactions on Computer Systems
language:en
Short-container-title:ACM Trans. Comput. Syst.

Author:

Zhuo Youwei¹^ORCID,Chen Jingji¹,Rao Gengyu¹,Luo Qinyi¹,Wang Yanzhi²,Yang Hailong³,Qian Depei³,Qian Xuehai¹

Affiliation:

1. University of Southern California, USA

2. Northeastern University, USA

3. Beihang University, China

Abstract

To hide the complexity of the underlying system, graph processing frameworks ask programmers to specify graph computations in user-defined functions (UDFs) of graph-oriented programming model. Due to the nature of distributed execution, current frameworks cannot precisely enforce the semantics of UDFs, leading to unnecessary computation and communication. It exemplifies a gap between programming model and runtime execution. This article proposes novel graph processing frameworks for distributed system and Processing-in-memory (PIM) architecture that precisely enforces loop-carried dependency; i.e., when a condition is satisfied by a neighbor, all following neighbors can be skipped. Our approach instruments the UDFs to express the loop-carried dependency, then the distributed execution framework enforces the precise semantics by performing dependency propagation dynamically. Enforcing loop-carried dependency requires the sequential processing of the neighbors of each vertex distributed in different nodes. We propose to circulant scheduling in the framework to allow different nodes to process disjoint sets of edges/vertices in parallel while satisfying the sequential requirement. The technique achieves an excellent trade-off between precise semantics and parallelism—the benefits of eliminating unnecessary computation and communication offset the reduced parallelism. We implement a new distributed graph processing framework SympleGraph, and two variants of runtime systems— GraphS and GraphSR —for PIM-based graph processing architecture, which significantly outperform the state-of-the-art.

Funder

National Science Foundation

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3453681

Reference85 articles.

1. A scalable processing-in-memory accelerator for parallel graph processing

2. Graph-based methods for analysing networks in cell biology

3. ARM. 2009. ARM Cortex-A5 Processor. Retrieved from http://www.arm.com/products/processors/cortex-a/cortex-a5.php. ARM. 2009. ARM Cortex-A5 Processor. Retrieved from http://www.arm.com/products/processors/cortex-a/cortex-a5.php.

4. Analysis and Optimization of the Memory Hierarchy for Graph Processing Workloads

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Accelerating Neural Network Training with Processing-in-Memory GPU;2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid);2022-05