Abstract
AbstractWe analyze some possibilities of using contiguity (neighbourhood) matrix as a constraint in the clustering made by the k-means and Ward methods as well as by an approach based on distances and probabilistic assignments aimed at obtaining a solution of the multi-facility location problem (MFLP). That is, some special two-stage algorithms being the kinds of clustering with relational constraint are proposed. They optimize division of set of objects into clusters respecting the requirement that neighbours have to belong to the same cluster. In the case of the probabilistic d-clustering, relevant modification of its target function is suggested and studied. Versatile simulation study and empirical analysis verify the practical efficiency of these methods. The quality of clustering is assessed on the basis of indices of homogeneity, heterogeneity and correctness of clusters as well as the silhouette index. Using these tools and similarity indices (Rand, Peirce and Sokal and Sneath), it was shown that the probabilistic d-clustering can produce better results than Ward’s algorithm. In comparison with the k-means approach, the probabilistic d-clustering—although gives rather similar results—is more robust to creation of trivial (of which empty) clusters and produces less diversified (in replications, in terms of correctness) results than k-means approach, i.e. is more predictable from the point of view of the clustering quality.
Funder
The President Stanislaw Wojciechowski State University of Applied Sciences
Publisher
Springer Science and Business Media LLC
Subject
Library and Information Sciences,Statistics, Probability and Uncertainty,Psychology (miscellaneous),Mathematics (miscellaneous)
Reference45 articles.
1. Albatineh, A. N. (2010). Means and variances for a family of similarity indices used in cluster analysis. Journal of Statistical Planning and Inference, 140, 2828–2838.
2. Albatineh, A. N., & Niewiadomska – Bugaj M. (2011). Correcting Jaccard and other similarity indices for chance agreement in cluster analysis. Advances in Data Analysis and Classification, 5, 179–200.
3. Albatineh, A. N., Niewiadomska-Bugaj, M., & Mihalko, D. (2006). On similarity indices and correction for chance agreement. Journal of Classification, 23, 301–313.
4. Ball, G. H., & Hall, D. J. (1967). A clustering technique for summarizing multivariate data. Behavioral Science, 12, 153–155.
5. Basu, S., Davidson, I., & Wagstaff, K. (Eds.). (2008). Constrained clustering: advances in algorithms, theory, and applications. Boca Raton: CRC Press.
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献