Affiliation:
1. University of Houston, Houston, Texas, USA
2. The University of Sydney, NSW, Australia
Abstract
Dynamic graph traversals (DGTs) currently are widely used in many important application domains, especially in this big-data era that urgently demands high-performance graph processing and analysis. Unlike static graph traversals, DGTs in real-world application scenarios require not only fast traversal acceleration itself but also, more importantly, a runtime strategy that can effectively accommodate the
ever-evolving nature of the graph structure updates followed by a diverse range of graph traversal algorithms
. Because of these special features, state-of-the-art designs on conventional compute-centric architectures (e.g., CPU and GPU) struggle to provide sufficient acceleration for DGT processing due to the dominating irregular memory access patterns in graph traversal algorithms and inefficient platform-specific update mechanisms. In this article, we explore the algorithmic features and runtime requirements of real-world DGTs and identify their unique opportunities of acceleration on the recent Micron Automata Processor (AP), an in-situ memory-centric pattern-matching architecture. These features include the natural mapping between traversal algorithms’ path exploration pattern to classic non-deterministic finite automata processing, AP’s architectural and compilation support for DGTs’ evolving traversal operations, and its inherent hardware fitness. However, despite these benefits, enabling highly efficient DGT execution on AP is non-trivial and faces several major challenges. To tackle them, we propose
DynamAP
, the first AP framework design that enables fast processing for general DGTs. DynamAP is oblivious to periodical traversal algorithm changes and can address the significant overhead caused by frequent graph updates and AP recompilation through our novel hybrid macro designs and associated efficient updating strategies. We evaluate DynamAP against the current DGT designs on a CPU, GPU, and AP with a range of widely adopted DGT algorithms and real-world graphs.
For a single update request
, our DynamAP achieves an average speedup of
21.3x
(up to
39.2x
) over the state-of-the-art implementation on host-AP architecture; an average speedup of
9.2x
(up to
14.7x
) and
1.7x
(up to
2.8x
) over two highly optimized DGT design frameworks on a 64-GB Intel(R) Xeon CPU and a 32-GB NVIDIA Tesla V100 GPU. DynamAP also maintains high performance and resource utilization for high graph update ratios, and can significantly benefit natural graphs that present a high average vertex degree.
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Information Systems,Software