Affiliation:
1. IBM Almaden Research Center, 650 Harry Road, San Jose, CA
Abstract
Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate cluster in large high dimensional datasets.
Publisher
Association for Computing Machinery (ACM)
Subject
Information Systems,Software
Reference44 articles.
1. AN OVERVIEW OF COMBINATORIAL DATA ANALYSIS
2. Arbor Software Corporation. Application Manager User's Guide Essbase Version 4.0 edition.]] Arbor Software Corporation. Application Manager User's Guide Essbase Version 4.0 edition.]]
3. Efficiently mining long patterns from databases
Cited by
720 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献