Explainable decomposition of nested dense subgraphs-Reference-Cited by-同舟云学术

Explainable decomposition of nested dense subgraphs

Published:2024-07-10 Issue: Volume: Page:
ISSN:1384-5810
Container-title:Data Mining and Knowledge Discovery
language:en
Short-container-title:Data Min Knowl Disc

Author:

Tatti Nikolaj

Abstract

AbstractDiscovering dense regions in a graph is a popular tool for analyzing graphs. While useful, analyzing such decompositions may be difficult without additional information. Fortunately, many real-world networks have additional information, namely node labels. In this paper we focus on finding decompositions that have dense inner subgraphs and that can be explained using labels. More formally, we construct a binary tree T with labels on non-leaves that we use to partition the nodes in the input graph. To measure the quality of the tree, we model the edges in the shell and the cross edges to the inner shells as a Bernoulli variable. We reward the decompositions with the dense regions by requiring that the model parameters are non-increasing. We show that our problem is NP-hard, even inapproximable if we constrain the size of the tree. Consequently, we propose a greedy algorithm that iteratively finds the best split and applies it to the current tree. We demonstrate how we can efficiently compute the best split by maintaining certain counters. Our experiments show that our algorithm can process networks with over million edges in few minutes. Moreover, we show that the algorithm can find the ground truth in synthetic data and produces interpretable decompositions when applied to real world networks.

Funder

Research council of Finland

University of Helsinki

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s10618-024-01053-8.pdf

Reference45 articles.

1. Alvarez-Hamelin JI, Dall’Asta L, Barrat A, Vespignani A (2006) Large scale networks fingerprinting and visualization using the k-core decomposition. In: NIPS, pp 41–50

2. Anderson CJ, Wasserman S, Faust K (1992) Building stochastic blockmodels. Soc Netw 14(1):137–161

3. Atzmueller M, Doerfel S, Mitzlaff F (2016) Description-oriented community detection using exhaustive subgroup discovery. Inf Sci 329:965–984

4. Ayer M, Brunk H, Ewing G, Reid W (1955) An empirical distribution function for sampling with incomplete information. Ann Math Stat 26(4):641–647

5. Bader GD, Hogue CW (2003) An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform 4(1):1–27