CURE-Reference-Cited by-同舟云学术

CURE

Published:1998-06 Issue:2 Volume:27 Page:73-84
ISSN:0163-5808
Container-title:ACM SIGMOD Record
language:en
Short-container-title:SIGMOD Rec.

Author:

Guha Sudipto¹,Rastogi Rajeev²,Shim Kyuseok²

Affiliation:

1. Stanford University, Stanford, CA

2. Bell Laboratories, Murray Hill, NJ

Abstract

Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical shapes and similar sizes, or are very fragile in the presence of outliers. We propose a new clustering algorithm called CURE that is more robust to outliers, and identifies clusters having non-spherical shapes and wide variances in size. CURE achieves this by representing each cluster by a certain fixed number of points that are generated by selecting well scattered points from the cluster and then shrinking them toward the center of the cluster by a specified fraction. Having more than one representative point per cluster allows CURE to adjust well to the geometry of non-spherical shapes and the shrinking helps to dampen the effects of outliers. To handle large databases, CURE employs a combination of random sampling and partitioning . A random sample drawn from the data set is first partitioned and each partition is partially clustered. The partial clusters are then clustered in a second pass to yield the desired clusters. Our experimental results confirm that the quality of clusters produced by CURE is much better than those found by existing algorithms. Furthermore, they demonstrate that random sampling and partitioning enable CURE to not only outperform existing algorithms but also to scale well for large databases without sacrificing clustering quality.

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/276305.276312

Reference14 articles.

1. The R*-tree: an efficient and robust access method for points and rectangles

Cited by 959 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Nonparametric Split and Kernel-Merge Clustering Algorithm;IEEE Transactions on Artificial Intelligence;2024-09

2. Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutions;PeerJ Computer Science;2024-08-29

3. An autonomous centreless approach to chunk-wise data partitioning;Evolving Systems;2024-08-05

4. A novel hybridization approach to improve the critical distance clustering algorithm: Balancing speed and quality;Expert Systems with Applications;2024-08

5. Clustering and forecasting of day-ahead electricity supply curves using a market-based distance;International Journal of Electrical Power & Energy Systems;2024-07