Abstract
AbstractOne of the most frequently used algorithms for clustering data with both numeric and categorical variables is the k-prototypes algorithm, an extension of the well-known k-means clustering. Gower’s distance denotes another popular approach for dealing with mixed-type data and is suitable not only for numeric and categorical but also for ordinal variables. In the paper a modification of the k-prototypes algorithm to Gower’s distance is proposed that ensures convergence. This provides a tool that allows to take into account ordinal information for clustering and can also be used for large data. A simulation study demonstrates convergence, good clustering results as well as small runtimes.
Publisher
Springer Science and Business Media LLC
Reference39 articles.
1. Ahmad A, Khan S (2018) Survey of state-of-the-art mixed data clustering algorithms. IEEE Access 7:31883–902
2. Aschenbruck R, Szepannek G (2020) Cluster validation for mixed-type data. Arch Data Sci Ser A 6(1):02. https://doi.org/10.5445/KSP/1000098011/02
3. Aschenbruck R, Szepannek G, Luenke K, Wilhelm A (2023) Heterogeneity in class: clustering student’s attitudes towards statistics. Stat Appl. https://doi.org/10.26398/IJAS.0034-008
4. Aschenbruck R, Szepannek G, Wilhelm A (2023) Random-based initialization strategies for clustering mixed-type data with the k-prototypes algorithm. In: Coretto P, Giordano G, La Rocca M, Parella M, Rampichini C (eds) CLADAG 2023 book of abstracts and short papers, vol 207. Pearson, pp 38–41
5. Aschenbruck R, Szepannek G, Wilhelm AFX (2022) Imputation strategies for clustering mixed-type data with missing values. J Classif. https://doi.org/10.1007/s00357-022-09422-y
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献