Affiliation:
1. The University of New South Wales, Australia
2. The University of New South Wales, Australia and University of Technology, Sydney, Australia
Abstract
Subgraph enumeration aims to find all the subgraphs of a large data graph that are isomorphic to a given pattern graph. As the subgraph isomorphism operation is computationally intensive, researchers have recently focused on solving this problem in distributed environments, such as MapReduce and Pregel. Among them, the state-of-the-art algorithm, Twin TwigJoin, is proven to be instance optimal based on a left-deep join framework. However, it is still not scalable to large graphs because of the constraints in the left-deep join framework and that each decomposed component (join unit) must be a star. In this paper, we propose SEED - a scalable sub-graph enumeration approach in the distributed environment. Compared to Twin TwigJoin, SEED returns optimal solution in a generalized join framework without the constraints in Twin TwigJoin. We use both star and clique as the join units, and design an effective distributed graph storage mechanism to support such an extension. We develop a comprehensive cost model, that estimates the number of matches of any given pattern graph by considering power-law degree distribution in the data graph. We then generalize the left-deep join framework and develop a dynamic-programming algorithm to compute an optimal bushy join plan. We also consider overlaps among the join units. Finally, we propose clique compression to further improve the algorithm by reducing the number of the intermediate results. Extensive performance studies are conducted on several real graphs, one containing billions of edges. The results demonstrate that our algorithm outperforms all other state-of-the-art algorithms by more than one order of magnitude.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
42 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. DuMato: An efficient warp-centric subgraph enumeration system for GPU;Journal of Parallel and Distributed Computing;2024-09
2. Optimizing subgraph retrieval and matching with an efficient indexing scheme;Knowledge and Information Systems;2024-07-16
3. GPU-accelerated relaxed graph pattern matching algorithms;The Journal of Supercomputing;2024-06-16
4. Understanding High-Performance Subgraph Pattern Matching: A Systems Perspective;Proceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA);2024-06-09
5. Speeding Up Subgraph Matching Queries with Schema Guided Index;Proceedings of the 2024 3rd International Conference on Networks, Communications and Information Technology;2024-06-07