Abstract
Recently, the two concepts that have been often discussed in the literature on taxonomy are the cluster ensemble and stability. An interesting proposal regarding the combination of these two concepts was presented by Șenbabaoğlu, Michailidis, and Li, who proposed as a measure of stability a proportion of ambiguously clustered pairs (PAC) for selecting the optimal number of groups in the cluster ensemble. This proposal appeared in the field of genetic research, but as the authors themselves write, the method can be successfully used also in other research areas.
The aim of this paper is to compare the results of indicating the number of clusters (k parameter) using the aggregated approach in taxonomy and the above-mentioned measure of stability and classical indices (e.g. Caliński–Harabasz, Dunn, Davies–Bouldin).
Publisher
Uniwersytet Lodzki (University of Lodz)
Reference28 articles.
1. Aldenderfer M.S., Blashfield R.K. (1984), Cluster analysis, Sage, Beverly Hills.
2. Anderberg M.R. (1973), Cluster analysis for applications, Academic Press, New York–San Francisco–London.
3. Ben-Hur A., Guyon I . (2003), Detecting stable clusters using principal component analysis, “Methods in Molecular Biology”, no. 224, pp. 159–182.
4. Brock G., Pihur V., Datta S., Datta S. (2008), clValid: an R package for cluster validation, “Journal of Statistical Software”, vol. 25(4), pp. 1–22, https://doi.org/10.18637/jss.v025.i04
5. Caliński R.B., Harabasz J. (1974), A dendrite method for cluster analysis, “Communications in Statistics”, vol. 3, pp. 1–27.