k-Means, Ward and Probabilistic Distance-Based Clustering Methods with Contiguity Constraint-Reference-Cited by-同舟云学术

k-Means, Ward and Probabilistic Distance-Based Clustering Methods with Contiguity Constraint

Published:2020-08-26 Issue:2 Volume:38 Page:313-352
ISSN:0176-4268
Container-title:Journal of Classification
language:en
Short-container-title:J Classif

Author:

Młodak Andrzej^ORCID

Abstract

AbstractWe analyze some possibilities of using contiguity (neighbourhood) matrix as a constraint in the clustering made by the k-means and Ward methods as well as by an approach based on distances and probabilistic assignments aimed at obtaining a solution of the multi-facility location problem (MFLP). That is, some special two-stage algorithms being the kinds of clustering with relational constraint are proposed. They optimize division of set of objects into clusters respecting the requirement that neighbours have to belong to the same cluster. In the case of the probabilistic d-clustering, relevant modification of its target function is suggested and studied. Versatile simulation study and empirical analysis verify the practical efficiency of these methods. The quality of clustering is assessed on the basis of indices of homogeneity, heterogeneity and correctness of clusters as well as the silhouette index. Using these tools and similarity indices (Rand, Peirce and Sokal and Sneath), it was shown that the probabilistic d-clustering can produce better results than Ward’s algorithm. In comparison with the k-means approach, the probabilistic d-clustering—although gives rather similar results—is more robust to creation of trivial (of which empty) clusters and produces less diversified (in replications, in terms of correctness) results than k-means approach, i.e. is more predictable from the point of view of the clustering quality.

Funder

The President Stanislaw Wojciechowski State University of Applied Sciences

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Statistics, Probability and Uncertainty,Psychology (miscellaneous),Mathematics (miscellaneous)

Link

https://link.springer.com/content/pdf/10.1007/s00357-020-09370-5.pdf

Reference45 articles.

1. Albatineh, A. N. (2010). Means and variances for a family of similarity indices used in cluster analysis. Journal of Statistical Planning and Inference, 140, 2828–2838.

2. Albatineh, A. N., & Niewiadomska – Bugaj M. (2011). Correcting Jaccard and other similarity indices for chance agreement in cluster analysis. Advances in Data Analysis and Classification, 5, 179–200.

3. Albatineh, A. N., Niewiadomska-Bugaj, M., & Mihalko, D. (2006). On similarity indices and correction for chance agreement. Journal of Classification, 23, 301–313.

4. Ball, G. H., & Hall, D. J. (1967). A clustering technique for summarizing multivariate data. Behavioral Science, 12, 153–155.

5. Basu, S., Davidson, I., & Wagstaff, K. (Eds.). (2008). Constrained clustering: advances in algorithms, theory, and applications. Boca Raton: CRC Press.

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Network Clustering;Wiley StatsRef: Statistics Reference Online;2024-05-27

2. El rol del contexto educativo digital vs presencial en perfiles de engagement académico;Revista de Investigación Educativa;2024-01-06

3. EXPLORING SOME SPATIALLY CONSTRAINED DELINEATION METHODS IN SEGMENTING THE MALAYSIAN COMMERCIAL PROPERTY MARKET;International Journal of Strategic Property Management;2023-12-21

4. Influence of the socio-spatial context on the perception of environmental problems in cities in Spain and Argentina;Journal of Cleaner Production;2023-11

5. Spatial distribution and trends of anemia among pregnant women in Ethiopia: EDHS 2005–2016;Frontiers in Public Health;2023-02-16