Affiliation:
1. Yahoo Labs, Spain
2. Aalto University, Finland
3. Harvard School of Engineering and Applied Sciences
4. Aalto University and Finnish Institute of Occupational Health, Finland
Abstract
We study a novel clustering problem in which the pairwise relations between objects are
categorical
. This problem can be viewed as clustering the vertices of a graph whose edges are of different types (
colors
). We introduce an objective function that ensures the edges within each cluster have, as much as possible, the same color. We show that the problem is
NP
-hard and propose a randomized algorithm with approximation guarantee proportional to the maximum degree of the input graph. The algorithm iteratively picks a random edge as a pivot, builds a cluster around it, and removes the cluster from the graph. Although being fast, easy to implement, and parameter-free, this algorithm tends to produce a relatively large number of clusters. To overcome this issue we introduce a variant algorithm, which modifies how the pivot is chosen and how the cluster is built around the pivot. Finally, to address the case where a fixed number of output clusters is required, we devise a third algorithm that directly optimizes the objective function based on the
alternating-minimization
paradigm.
We also extend our objective function to handle cases where object’s relations are described by multiple labels. We modify our randomized approximation algorithm to optimize such an extended objective function and show that its approximation guarantee remains proportional to the maximum degree of the graph.
We test our algorithms on synthetic and real data from the domains of social media, protein-interaction networks, and bibliometrics. Results reveal that our algorithms outperform a baseline algorithm both in the task of reconstructing a ground-truth clustering and in terms of objective-function value.
Funder
Yahoo! Internship program
Publisher
Association for Computing Machinery (ACM)
Cited by
19 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献