Subspace clustering for high dimensional data-Reference-Cited by-同舟云学术

Subspace clustering for high dimensional data

Published:2004-06 Issue:1 Volume:6 Page:90-105
ISSN:1931-0145
Container-title:ACM SIGKDD Explorations Newsletter
language:en
Short-container-title:SIGKDD Explor. Newsl.

Author:

Parsons Lance¹,Haque Ehtesham¹,Liu Huan¹

Affiliation:

1. Arizona State University, Tempe, AZ

Abstract

Subspace clustering is an extension of traditional clustering that seeks to find clusters in different subspaces within a dataset. Often in high dimensional data, many dimensions are irrelevant and can mask existing clusters in noisy data. Feature selection removes irrelevant and redundant dimensions by analyzing the entire dataset. Subspace clustering algorithms localize the search for relevant dimensions allowing them to find clusters that exist in multiple, possibly overlapping subspaces. There are two major branches of subspace clustering based on their search strategy. Top-down algorithms find an initial clustering in the full set of dimensions and evaluate the subspaces of each cluster, iteratively improving the results. Bottom-up approaches find dense regions in low dimensional spaces and combine them to form clusters. This paper presents a survey of the various subspace clustering algorithms along with a hierarchy organizing the algorithms by their defining characteristics. We then compare the two main approaches to subspace clustering using empirical scalability and accuracy tests and discuss some potential applications where subspace clustering could be particularly useful.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/1007730.1007731

Reference74 articles.

1. Database-friendly random projections

2. Re-designing distance functions and distance-based applications for high dimensional data

3. Fast algorithms for projected clustering

Cited by 665 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A comprehensive review of clustering techniques in artificial intelligence for knowledge discovery: Taxonomy, challenges, applications and future prospects;Advanced Engineering Informatics;2024-10

2. Tensor-based multi-view spectral clustering via shared latent space;Information Fusion;2024-08

3. A parameter free relative density based biclustering method for identifying non-linear feature relations;Heliyon;2024-08

4. Contribution of El Niño Southern Oscillation (ENSO) Diversity to Low‐Frequency Changes in ENSO Variance;Geophysical Research Letters;2024-07-22

5. Sparse and geometry-aware generalisation of the mutual information for joint discriminative clustering and feature selection;Statistics and Computing;2024-07-17