Large Scale K-Clustering-Reference-Cited by-同舟云学术

Large Scale K-Clustering

Published:2024-07-20 Issue: Volume: Page:
ISSN:1556-4681
Container-title:ACM Transactions on Knowledge Discovery from Data
language:en
Short-container-title:ACM Trans. Knowl. Discov. Data

Author:

Voevodski Konstantin¹^ORCID

Affiliation:

1. Google Inc., USA

Abstract

Large-scale learning algorithms are essential for modern data collections that may have billions of data points. Here we study the design of parallel

\(k\)

-clustering algorithms, which include the

\(k\)

-median,

\(k\)

-medoids, and

\(k\)

-means clustering problems. We design efficient parallel algorithms for these problems and prove that they still compute constant-factor approximations to the optimal solution for stable clustering instances. In addition to our theoretic results we present computational experiments that show that our

\(k\)

-median and

\(k\)

-means algorithms work well in practice - we are able to find better clusterings than state-of-the-art coreset constructions using samples of the same size.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3674508

Reference39 articles.

1. Nir Ailon Ragesh Jaiswal and Claire Monteleoni. 2009. Streaming K-Means Approximation. In NIPS. 10–18.

2. David Arthur and Sergei Vassilvitskii. 2007. K-Means++: The Advantages of Careful Seeding. In SODA. 1027–1035.

3. Vijay Arya Naveen Garg Rohit Khandekar Adam Meyerson Kamesh Munagala and Vinayaka Pandit. 2001. Local Search Heuristics for K-median and Facility Location Problems. In STOC. 21–29.

4. Hassan Ashtiani Shrinu Kushagra and Shai Ben-David. 2016. Clustering with Same-Cluster Queries. In NIPS. 3216–3224.

5. Center-based clustering under perturbation stability