Affiliation:
1. Lanzhou University
2. Chinese Academy of Sciences
Abstract
We propose a novel density estimation method using both the k-nearest neighbor (KNN) graph and the potential field of the data points to capture the local and global data distribution information respectively. The clustering is performed based on the computed density values. A forest of trees is built using each data point as the tree node. And the clusters are formed according to the trees in the forest. The new clustering method is evaluated by comparing with three popular clustering methods, K-means++, Mean Shift and DBSCAN. Experiments on two synthetic data sets and one real data set show that our approach can effectively improve the clustering results.
Publisher
Trans Tech Publications, Ltd.
Reference12 articles.
1. M. G. Omran, A. P. Engelbrecht and A. Salman, An overview of clustering methods, Intelligent Data Analysis, Vol. 1(6), pp.583-605 (2007).
2. T. Kanungo, D. M. Mount, N. Netanyahu, C. Piatko, R. Silverman and A. Y. Wu, An Efficient k-Means Clustering Algorithm: Analysis and Implementation, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 24, pp.881-892 (2002).
3. R. Sharan, R. Elkon and R. Shamir, Cluster analysis and its applications to gene expression data, Ernst Schering Workshop on Bioinformatics and Genome Analysis, 83-108 (2002).
4. P. Hansen and B. Jaumard, Cluster analysis and mathematical programming, Mathematical Programming, Vol. 79, pp.191-215 (1997).
5. D. Arthur and S. Vassilvitskii, Kmeans++: The advantages of careful seeding, ACM-SIAM Symposium on Discrete Algorithms, 1027-1035 (2007).