Local search methods for k-means with outliers-Reference-Cited by-同舟云学术

Local search methods for k-means with outliers

Published:2017-03 Issue:7 Volume:10 Page:757-768
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Gupta Shalmoli¹,Kumar Ravi²,Lu Kefu³,Moseley Benjamin³,Vassilvitskii Sergei²

Affiliation:

1. University of Illinois

2. Google

3. Washington University

Abstract

We study the problem of k -means clustering in the presence of outliers. The goal is to cluster a set of data points to minimize the variance of the points assigned to the same cluster, with the freedom of ignoring a small set of data points that can be labeled as outliers. Clustering with outliers has received a lot of attention in the data processing community, but practical, efficient, and provably good algorithms remain unknown for the most popular k -means objective. Our work proposes a simple local search-based algorithm for k -means clustering with outliers. We prove that this algorithm achieves constant-factor approximate solutions and can be combined with known sketching techniques to scale to large data sets. Using empirical evaluation on both synthetic and large-scale real-world data, we demonstrate that the algorithm dominates recently proposed heuristic approaches for the problem.

Publisher

VLDB Endowment

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/3067421.3067425

Cited by 65 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MapReduce algorithms for robust center-based clustering in doubling metrics;Journal of Parallel and Distributed Computing;2024-12

2. Efficient and robust clustering based on backbone identification;Pattern Recognition;2024-11

3. RHiREM: Intelligent diagnostic framework for pipeline Eddy Current Internal Inspection based on reinforcement learning with hierarchical reward exploration mechanism;NDT & E International;2024-06

4. Robust $k$-Means-Type Clustering for Noisy Data;IEEE Transactions on Neural Networks and Learning Systems;2024

5. Clustering approximation via a fusion of multiple random samples;Information Fusion;2024-01