Author:
Liu Hao, ,Oyama Satoshi,Kurihara Masahito,Sato Haruhiko
Abstract
Clustering is an important tool for data analysis and many clustering techniques have been proposed over the past years. Among them are density-based clustering methods, which have several benefits such as the number of clusters is not required before carrying out clustering; the detected clusters can be represented in an arbitrary shape and outliers can be detected and removed. Recently, the density-based algorithms were extended with the fuzzy set theory, which has made these algorithm more robust. However, the density-based clustering algorithms usually require a time complexity ofO(n2) wherenis the number of data in the data set, implying that they are not suitable to work with large scale data sets. In this paper, a novel clustering algorithm called landmark fuzzy neighborhood DBSCAN (landmark FN-DBSCAN) is proposed. The concept, landmark, is used to represent a subset of the input data set which makes the algorithm efficient on large scale data sets. We give a theoretical analysis on time complexity and space complexity, which shows both of them are linear to the size of the data set. The experiments show that the landmark FN-DBSCAN is much faster than FN-DBSCAN and provides a very good quality of clustering.
Publisher
Fuji Technology Press Ltd.
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Human-Computer Interaction
Reference29 articles.
1. R. Xu and D. C.Wunsch II, “Survey of clustering algorithms,” IEEE Trans. Neural Netw., Vol.16, No.3, pp. 645-678, 2005.
2. J. Moody and C. Darken, “Fast learning in networks of locallytuned processing units,” Neural Computation, Vol.1, No.2, pp. 281-294, 1989.
3. M.-C. Chiang, C.-W. Tsai, and C.-S. Yang, “A time-efficient pattern reduction algorithm for k-means clustering,” Information Sciences, Vol.181, No.4, pp. 716-731, 2011.
4. R. T. Ng and J. Han, “Clarans: A method for clustering objects for spatial data mining,” IEEE Trans. Knowl. Data Eng., Vol.14, No.5, pp. 1003-1016, 2002.
5. J. B. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol.1, pp. 281-297, 1967.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献