Abstract
AbstractDiscovering dense regions in a graph is a popular tool for analyzing graphs. While useful, analyzing such decompositions may be difficult without additional information. Fortunately, many real-world networks have additional information, namely node labels. In this paper we focus on finding decompositions that have dense inner subgraphs and that can be explained using labels. More formally, we construct a binary tree T with labels on non-leaves that we use to partition the nodes in the input graph. To measure the quality of the tree, we model the edges in the shell and the cross edges to the inner shells as a Bernoulli variable. We reward the decompositions with the dense regions by requiring that the model parameters are non-increasing. We show that our problem is NP-hard, even inapproximable if we constrain the size of the tree. Consequently, we propose a greedy algorithm that iteratively finds the best split and applies it to the current tree. We demonstrate how we can efficiently compute the best split by maintaining certain counters. Our experiments show that our algorithm can process networks with over million edges in few minutes. Moreover, we show that the algorithm can find the ground truth in synthetic data and produces interpretable decompositions when applied to real world networks.
Funder
Research council of Finland
University of Helsinki
Publisher
Springer Science and Business Media LLC
Reference45 articles.
1. Alvarez-Hamelin JI, Dall’Asta L, Barrat A, Vespignani A (2006) Large scale networks fingerprinting and visualization using the k-core decomposition. In: NIPS, pp 41–50
2. Anderson CJ, Wasserman S, Faust K (1992) Building stochastic blockmodels. Soc Netw 14(1):137–161
3. Atzmueller M, Doerfel S, Mitzlaff F (2016) Description-oriented community detection using exhaustive subgroup discovery. Inf Sci 329:965–984
4. Ayer M, Brunk H, Ewing G, Reid W (1955) An empirical distribution function for sampling with incomplete information. Ann Math Stat 26(4):641–647
5. Bader GD, Hogue CW (2003) An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform 4(1):1–27