Abstract
Clustering is a fundamental procedure in the analysis of scientific data. It is used ubiquitously across the sciences. Despite decades of research, existing clustering algorithms have limited effectiveness in high dimensions and often require tuning parameters for different domains and datasets. We present a clustering algorithm that achieves high accuracy across multiple domains and scales efficiently to high dimensions and large datasets. The presented algorithm optimizes a smooth continuous objective, which is based on robust statistics and allows heavily mixed clusters to be untangled. The continuous nature of the objective also allows clustering to be integrated as a module in end-to-end feature learning pipelines. We demonstrate this by extending the algorithm to perform joint clustering and dimensionality reduction by efficiently optimizing a continuous global objective. The presented approach is evaluated on large datasets of faces, hand-written digits, objects, newswire articles, sensor readings from the Space Shuttle, and protein expression levels. Our method achieves high accuracy across all datasets, outperforming the best prior algorithm by a factor of 3 in average rank.
Publisher
Proceedings of the National Academy of Sciences
Reference55 articles.
1. Some methods for classification and analysis of multivariate observations;MacQueen;Proc Berkeley Symp Math Stat Probab,1967
2. Normalized cuts and image segmentation;Shi;PAMI,2000
3. On spectral clustering: Analysis and an algorithm;Ng,2002
4. A tutorial on spectral clustering
5. Clustering with Bregman divergences;Banerjee;J Mach Learn Res,2005
Cited by
117 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献