Scalable Pattern Matching in Metadata Graphs via Constraint Checking

Author:

Reza Tahsin1,Halawa Hassan1,Ripeanu Matei1,Sanders Geoffrey2,Pearce Roger A.2

Affiliation:

1. University of British Columbia, Vancouver, BC Canada

2. Lawrence Livermore National Laboratory, Livermore, CA USA

Abstract

Pattern matching is a fundamental tool for answering complex graph queries. Unfortunately, existing solutions have limited capabilities: They do not scale to process large graphs and/or support only a restricted set of search templates or usage scenarios. Moreover, the algorithms at the core of the existing techniques are not suitable for today’s graph processing infrastructures relying on horizontal scalability and shared-nothing clusters, as most of these algorithms are inherently sequential and difficult to parallelize. We present an algorithmic pipeline that bases pattern matching on constraint checking . The key intuition is that each vertex and edge participating in a match has to meet a set of constraints implicitly specified by the search template. These constraints can be verified independently and typically are less expensive to compute than searching the full template. The pipeline we propose generates these constraints and iterates over them to eliminate all the vertices and edges that do not participate in any match, thus reducing the background graph to a subgraph that is the union of all template matches—the complete set of all vertices and edges that participate in at least one match. Additional analysis can be performed on this annotated, reduced graph, such as full match enumeration, match counting, or computing vertex/edge centrality. Furthermore, a vertex-centric formulation for constraint checking algorithms exists, and this makes it possible to harness existing high-performance, vertex-centric graph processing frameworks. This technique (i) enables highly scalable pattern matching in metadata (labeled) graphs; (ii) supports arbitrary patterns with 100% precision; (iii) enables tradeoffs between precision and time-to-solution, while always selects all vertices and edges that participate in matches, thus offering 100% recall; and (iv) supports a set of popular data analytics scenarios. We implement our approach on top of HavoqGT, an open-source asynchronous graph processing framework, and demonstrate its advantages through strong and weak scaling experiments on massive scale real-world (up to 257 billion edges) and synthetic (up to 4.4 trillion edges) labeled graphs, respectively, and at scales (1,024 nodes / 36,864 cores), orders of magnitude larger than used in the past for similar problems. This article serves two purposes: First, it synthesises the knowledge accumulated during a long-term project [Reza et al. 2017, 2018; Tripoul et al. 2018]. Second, it presents new system features, usage scenarios, optimizations, and comparisons with related work that strengthen the confidence that pattern matching based on iterative pruning via constraint checking is an effective and scalable approach in practice. The new contributions include the following: (i) We demonstrate the ability of the constraint checking approach to efficiently support two additional search scenarios that often emerge in practice, interactive incremental search and exploratory search . (ii) We empirically compare our solution with two additional state-of-the-art systems, Arabsque [Teixeira et al. 2015] and TriAD [Gurajada et al. 2014]. (iii) We show the ability of our solution to accommodate a more diverse range of datasets with varying properties, e.g., scale, skewness, label distribution, and match frequency. (iv) We introduce or extend a number of system features (e.g., work aggregation, load balancing, and the ability to cap the generated traffic) and design optimizations and demonstrate their advantages with respect to improving performance and scalability. (v) We present bottleneck analysis and insights into artifacts that influence performance. (vi) We present a theoretical complexity argument that motivates the performance gains we observe.

Funder

National Laboratory

Publisher

Association for Computing Machinery (ACM)

Subject

Computational Theory and Mathematics,Computer Science Applications,Hardware and Architecture,Modelling and Simulation,Software

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Graph Pattern Matching Based Intelligent Operation and Maintenance Method for Cloud Resource Operation;2023 8th International Conference on Data Science in Cyberspace (DSC);2023-08-18

2. Graph Reachability Pruning: Adaptive Data Reduction for Inexact Subgraph Matching;2022 IEEE International Conference on Knowledge Graph (ICKG);2022-11

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3