Comparing Two Clusterings Using Matchings between Clusters of Clusters-Reference-Cited by-同舟云学术

Comparing Two Clusterings Using Matchings between Clusters of Clusters

Published:2019-12-17 Issue: Volume:24 Page:1-41
ISSN:1084-6654
Container-title:ACM Journal of Experimental Algorithmics
language:en
Short-container-title:ACM J. Exp. Algorithmics

Author:

Cazals F.¹,Mazauric D.¹,Tetley R.¹,Watrigant R.²

Affiliation:

1. Université Côte d'Azur, Inria, France

2. University Lyon, CNRS, ENS de Lyon, Université Claude Bernard Lyon 1, LIP UMR5668, France

Abstract

Clustering is a fundamental problem in data science, yet the variety of clustering methods and their sensitivity to parameters make clustering hard. To analyze the stability of a given clustering algorithm while varying its parameters, and to compare clusters yielded by different algorithms, several comparison schemes based on matchings, information theory, and various indices (Rand, Jaccard) have been developed. We go beyond these by providing a novel class of methods computing meta-clusters within each clustering—a meta-cluster is a group of clusters, together with a matching between these. Let the intersection graph of two clusterings be the edge-weighted bipartite graph in which the nodes represent the clusters, the edges represent the nonempty intersection between two clusters, and the weight of an edge is the number of common items. We introduce the so-called D -family-matching problem on intersection graphs, with D the upper bound on the diameter of the graph induced by the clusters of any meta-cluster. First we prove NP -completeness and APX -hardness results, and unbounded approximation ratio of simple strategies. Second, we design exact polynomial time dynamic programming algorithms for some classes of graphs (in particular trees). Then we prove spanning tree–based efficient heuristic algorithms for general graphs. Our experiments illustrate the role of D as a scale parameter providing information on the relationship between clusters within a clustering and in-between two clusterings. They also show the advantages of our built-in mapping over classical cluster comparison measures such as the variation of information.

Publisher

Association for Computing Machinery (ACM)

Subject

Theoretical Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3345951

Reference46 articles.

1. Survey of Clustering Algorithms

2. R. O. Duda and P. E. Hart. 1973. Pattern Classification and Scene Analysis. Wiley. R. O. Duda and P. E. Hart. 1973. Pattern Classification and Scene Analysis. Wiley.

3. Mean shift, mode seeking, and clustering

4. Persistence-Based Clustering in Riemannian Manifolds

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Clusters of COVID-19 Indicators in India: Characterization, Correspondence and Change Analysis;SN Computer Science;2022-04-05

2. Analyzing the Error Rates of Bitcoin Clustering Heuristics;IFIP Advances in Information and Communication Technology;2022

3. MaxMin clustering for historical analogy;SN Applied Sciences;2020-07-28