Author:
Fernandes Felipe Schreiber,Figueiredo Daniel Ratton,Dreveton Maximilien
Abstract
Network clustering tackles the problem of identifying sets of nodes (clusters or communities) that have similar connection patterns. However, in many modern scenarios, nodes also have attributes that are correlated with the network structure. Thus, network information (edges) and node information (attributes) can be jointly leveraged to design high-performance clustering algorithms. Under a general model for the network and node attributes, this thesis establishes an information-theoretic criterion for the exact recovery of community labels and characterizes a phase transition determined by the Chernoff-Hellinger divergence of the model. The criterion shows how network and attribute information can be exchanged in order to yield exact recovery (e.g., more reliable network information requires less reliable attribute information). This thesis also presents two iterative clustering algorithms that greedily maximizes the joint likelihood of the model under the assumption that the probability distribution of network edges and node attributes belong to exponential families. Extensive analysis of the two algorithms on both synthetic datasets and real benchmarks highlights their accuracy and performance with respect to other state-of-the-art approaches.
Publisher
Sociedade Brasileira de Computação - SBC
Reference16 articles.
1. Abbe, E. (2017). Community detection and stochastic block models: recent developments. The Journal of Machine Learning Research, 18(1):6446–6531.
2. Abbe, E., Fan, J., and Wang, K. (2022). An ℓp theory of pca and spectral clustering. The Annals of Statistics, 50(4):2359–2385.
3. Abbe, E. and Sandon, C. (2015). Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery. In IEEE FOCS.
4. Banerjee, A., Merugu, S., Dhillon, I. S., Ghosh, J., and Lafferty, J. (2005). Clustering with bregman divergences. Journal of machine learning research, 6(10).
5. Braun, G., Tyagi, H., and Biernacki, C. (2022). An iterative clustering algorithm for the contextual stochastic block model with optimality guarantees. In ICML.