k-Means+++: Outliers-Resistant Clustering-Reference-Cited by-同舟云学术

k-Means+++: Outliers-Resistant Clustering

Published:2020-11-27 Issue:12 Volume:13 Page:311
ISSN:1999-4893
Container-title:Algorithms
language:en
Short-container-title:Algorithms

Author:

Statman Adiel,Rozenberg Liat^ORCID,Feldman Dan^ORCID

Abstract

The k-means problem is to compute a set of k centers (points) that minimizes the sum of squared distances to a given set of n points in a metric space. Arguably, the most common algorithm to solve it is k-means++ which is easy to implement and provides a provably small approximation error in time that is linear in n. We generalize k-means++ to support outliers in two sense (simultaneously): (i) nonmetric spaces, e.g., M-estimators, where the distance dist(p,x) between a point p and a center x is replaced by mindist(p,x),c for an appropriate constant c that may depend on the scale of the input. (ii) k-means clustering with m≥1 outliers, i.e., where the m farthest points from any given k centers are excluded from the total sum of distances. This is by using a simple reduction to the (k+m)-means clustering (with no outliers).

Publisher

MDPI AG

Subject

Computational Mathematics,Computational Theory and Mathematics,Numerical Analysis,Theoretical Computer Science

Link

https://www.mdpi.com/1999-4893/13/12/311/pdf

Reference59 articles.

1. Knowledge Discovery and Data Mining: Towards a Unifying Framework;Fayyad,1996

2. Vector Quantization and Signal Compression;Gersho,2012

3. Pattern Classification and Scene Analysis;Duda,1973

4. NP-hardness of Euclidean sum-of-squares clustering

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MapReduce algorithms for robust center-based clustering in doubling metrics;Journal of Parallel and Distributed Computing;2024-12

2. Hybrid fuzzy clustering technique to enhance the performance based on a fusion of intuitionistic modified fuzzy c-means and improved genetic algorithm;International Journal of Data Science and Analytics;2023-12-14

3. k-Median/Means with Outliers Revisited: A Simple Fpt Approximation;Lecture Notes in Computer Science;2023-12-09

4. Distributed k-Means with Outliers in General Metrics;Euro-Par 2023: Parallel Processing;2023

5. Machine Learning-Enabled Internet of Things (IoT): Data, Applications, and Industry Perspective;Electronics;2022-08-26