Representative and Back-In-Time Sampling from Real-world Hypergraphs-Reference-Cited by-同舟云学术

Representative and Back-In-Time Sampling from Real-world Hypergraphs

Published:2024-04-26 Issue:6 Volume:18 Page:1-48
ISSN:1556-4681
Container-title:ACM Transactions on Knowledge Discovery from Data
language:en
Short-container-title:ACM Trans. Knowl. Discov. Data

Author:

Choe Minyoung¹^ORCID,Yoo Jaemin²^ORCID,Lee Geon³^ORCID,Baek Woonsung²^ORCID,Kang U⁴^ORCID,Shin Kijung³^ORCID

Affiliation:

1. Kim Jaechul Graduate School of AI, KAIST, Seoul, Korea (the Republic of)

2. School of Electrical Engineering, KAIST, Daejeon Korea (the Republic of)

3. Kim Jaechul Graduate School of AI, KAIST, Seoul Korea (the Republic of)

4. Dept. of Computer Science and Engineering, Seoul National University, Seoul Korea (the Republic of)

Abstract

Graphs are widely used for representing pairwise interactions in complex systems. Since such real-world graphs are large and often evergrowing, sampling subgraphs is useful for various purposes, including simulation, visualization, stream processing, representation learning, and crawling. However, many complex systems consist of group interactions (e.g., collaborations of researchers and discussions on online Q&A platforms) and thus are represented more naturally and accurately by hypergraphs than by ordinary graphs. Motivated by the prevalence of large-scale hypergraphs, we study the problem of sampling from real-world hypergraphs, aiming at answering (Q1) how can we measure the goodness of sub-hypergraphs, and (Q2) how can we efficiently find a “good” sub-hypergraph. Regarding Q1, we distinguish between two goals: (a) representative sampling , which aims at capturing the characteristics of the input hypergraph, and (b) back-in-time sampling , which aims at closely approximating a past snapshot of the input time-evolving hypergraph. To evaluate the similarity of the sampled sub-hypergraph to the target (i.e., the input hypergraph or its past snapshot), we consider 10 graph-level, hyperedge-level, and node-level statistics. Regarding Q2, we first conduct a thorough analysis of various intuitive approaches using 11 real-world hypergraphs. Then, based on this analysis, we propose MiDaS and MiDaS-B , designed for representative sampling and back-in-time sampling, respectively. Regarding representative sampling, we demonstrate through extensive experiments that MiDaS , which employs a sampling bias toward high-degree nodes in hyperedge selection, is (a) Representative : finding overall the most representative samples among 15 considered approaches, (b) Fast : several orders of magnitude faster than the strongest competitors, and (c) Automatic : automatically tuning the degree of sampling bias. Regarding back-in-time sampling, we demonstrate that MiDaS-B inherits the strengths of MiDaS despite an additional challenge—the unavailability of the target (i.e., past snapshot). It effectively handles this challenge by focusing on replicating universal evolutionary patterns, rather than directly replicating the target.

Funder

Korea government (MSIT) grant funded by the Korea government

Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3653306

Reference85 articles.

1. Outlier detection in graph streams

2. Network sampling via edge-based node selection with graph induction;Ahmed Nesreen;Department of Computer Science Technical Reports,2011

3. A Survey on Hypergraph Representation Learning

4. Construction and Random Generation of Hypergraphs with Prescribed Degree and Dimension Sequences

5. Simplicial closure and higher-order link prediction