Affiliation:
1. The University of Hong Kong
2. CNRS, Univ. Grenoble Aples
Abstract
Hypothesis testing is a statistical method used to draw conclusions about populations from sample data, typically represented in tables. With the prevalence of graph representations in real-life applications, hypothesis testing on graphs is gaining importance. In this work, we formalize node, edge, and path hypotheses on attributed graphs. We develop a sampling-based hypothesis testing framework, which can accommodate existing hypothesis-agnostic graph sampling methods. To achieve accurate and time-efficient sampling, we then propose a Path-Hypothesis-Aware SamplEr, PHASE, an
m
-dimensional random walk that accounts for the paths specified in the hypothesis. We further optimize its time efficiency and propose PHASE
opt
. Experiments on three real datasets demonstrate the ability of our framework to leverage common graph sampling methods for hypothesis testing, and the superiority of hypothesis-aware sampling methods in terms of accuracy and time efficiency.
Publisher
Association for Computing Machinery (ACM)
Reference36 articles.
1. 2015. Yelp Dataset. https://www.yelp.com/dataset.
2. Lada A. Adamic, Rajan M. Lukose, Amit R. Puniyani, and Bernardo A. Huberman. 2001. Search in Power-Law Networks. CoRR cs.NI/0103016 (2001).
3. Ery Arias-Castro and Nicolas Verzelen. 2014. Community detection in dense random networks. (2014).
4. Slim graph
5. Peter J. Bickel and Purnamrita Sarkar. 2013. Hypothesis Testing for Automated Community Detection in Networks. CoRR abs/1311.2694 (2013).