Representative and Back-In-Time Sampling from Real-world Hypergraphs

Author:

Choe Minyoung1ORCID,Yoo Jaemin2ORCID,Lee Geon3ORCID,Baek Woonsung2ORCID,Kang U4ORCID,Shin Kijung3ORCID

Affiliation:

1. Kim Jaechul Graduate School of AI, KAIST, Seoul, Korea (the Republic of)

2. School of Electrical Engineering, KAIST, Daejeon Korea (the Republic of)

3. Kim Jaechul Graduate School of AI, KAIST, Seoul Korea (the Republic of)

4. Dept. of Computer Science and Engineering, Seoul National University, Seoul Korea (the Republic of)

Abstract

Graphs are widely used for representing pairwise interactions in complex systems. Since such real-world graphs are large and often evergrowing, sampling subgraphs is useful for various purposes, including simulation, visualization, stream processing, representation learning, and crawling. However, many complex systems consist of group interactions (e.g., collaborations of researchers and discussions on online Q&A platforms) and thus are represented more naturally and accurately by hypergraphs than by ordinary graphs. Motivated by the prevalence of large-scale hypergraphs, we study the problem of sampling from real-world hypergraphs, aiming at answering (Q1) how can we measure the goodness of sub-hypergraphs, and (Q2) how can we efficiently find a “good” sub-hypergraph. Regarding Q1, we distinguish between two goals: (a) representative sampling , which aims at capturing the characteristics of the input hypergraph, and (b) back-in-time sampling , which aims at closely approximating a past snapshot of the input time-evolving hypergraph. To evaluate the similarity of the sampled sub-hypergraph to the target (i.e., the input hypergraph or its past snapshot), we consider 10 graph-level, hyperedge-level, and node-level statistics. Regarding Q2, we first conduct a thorough analysis of various intuitive approaches using 11 real-world hypergraphs. Then, based on this analysis, we propose MiDaS and MiDaS-B , designed for representative sampling and back-in-time sampling, respectively. Regarding representative sampling, we demonstrate through extensive experiments that MiDaS , which employs a sampling bias toward high-degree nodes in hyperedge selection, is (a) Representative : finding overall the most representative samples among 15 considered approaches, (b) Fast : several orders of magnitude faster than the strongest competitors, and (c) Automatic : automatically tuning the degree of sampling bias. Regarding back-in-time sampling, we demonstrate that MiDaS-B inherits the strengths of MiDaS despite an additional challenge—the unavailability of the target (i.e., past snapshot). It effectively handles this challenge by focusing on replicating universal evolutionary patterns, rather than directly replicating the target.

Funder

Korea government (MSIT) grant funded by the Korea government

Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government

Publisher

Association for Computing Machinery (ACM)

Reference85 articles.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3