Affiliation:
1. Department of Statistics and Data Science, Southern Methodist University , 3225 Daniel Avenue , Dallas, Texas 75205, USA
2. Department of Statistics and Data Science, National University of Singapore , S16 Science Drive 2 , 117546 Singapore
Abstract
Summary
Community detection is a crucial task in network analysis that can be significantly improved by incorporating subject-level information, ie, covariates. Existing methods have shown the effectiveness of using covariates on the low-degree nodes, but rarely discuss the case where communities have significantly different density levels, ie, multiscale networks. In this paper, we introduce a novel method that addresses this challenge by constructing network-adjusted covariates, which leverage the network connections and covariates with a node-specific weight for each node. This weight can be calculated without tuning parameters. We present novel theoretical results on the strong consistency of our method under degree-corrected stochastic blockmodels with covariates, even in the presence of misspecification and multiple sparse communities. Additionally, we establish a general lower bound for the community detection problem when both the network and covariates are present, and it shows that our method is optimal for connection intensity up to a constant factor. Our method outperforms existing approaches in simulations and a LastFM app user network. We then compare our method with others on a statistics publication citation network where 30% of nodes are isolated, and our method produces reasonable and balanced results. Our method is implemented in the R package NAC.
Publisher
Oxford University Press (OUP)