A divide-and-merge methodology for clustering-Reference-Cited by-同舟云学术

A divide-and-merge methodology for clustering

Published:2006-12 Issue:4 Volume:31 Page:1499-1525
ISSN:0362-5915
Container-title:ACM Transactions on Database Systems
language:en
Short-container-title:ACM Trans. Database Syst.

Author:

Cheng David¹,Kannan Ravi²,Vempala Santosh¹,Wang Grant¹

Affiliation:

1. Massachusetts Institute of Technology, Cambridge, MA

2. Yale University, New Haven, CT

Abstract

We present a divide-and-merge methodology for clustering a set of objects that combines a top-down “divide” phase with a bottom-up “merge” phase. In contrast, previous algorithms use either top-down or bottom-up methods to construct a hierarchical clustering or produce a flat clustering using local search (e.g., k -means). For the divide phase, which produces a tree whose leaves are the elements of the set, we suggest an efficient spectral algorithm. When the data is in the form of a sparse document-term matrix, we show how to modify the algorithm so that it maintains sparsity and runs in linear space. The merge phase quickly finds the optimal partition that respects the tree for many natural objective functions, for example, k -means, min-diameter, min-sum, correlation clustering, etc. We present a thorough experimental evaluation of the methodology. We describe the implementation of a meta-search engine that uses this methodology to cluster results from web searches. We also give comparative empirical results on several real datasets.

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/1189769.1189779

Reference39 articles.

1. Anderberg M. 1973. Cluster Analysis for Applications. Academic Press New York NY. Anderberg M. 1973. Cluster Analysis for Applications. Academic Press New York NY.

2. COOLCAT

3. Frequent term-based text clustering

Cited by 45 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Fast Spectral Graph Partitioning with a Randomized Eigensolver;2023 IEEE High Performance Extreme Computing Conference (HPEC);2023-09-25

2. Functional parcellation of the hippocampus based on its layer-specific connectivity with default mode and dorsal attention networks;NeuroImage;2022-07

3. Neurofunctional Segmentation Shifts in the Hippocampus;Frontiers in Human Neuroscience;2021-11-01

4. All-Three: Near-optimal and domain-independent algorithms for near-duplicate detection;Array;2021-09

5. Iterated multilevel simulated annealing for large-scale graph conductance minimization;Information Sciences;2021-09