Clustering large mixed-type data with ordinal variables-Reference-Cited by-同舟云学术

Clustering large mixed-type data with ordinal variables

Published:2024-05-27 Issue: Volume: Page:
ISSN:1862-5347
Container-title:Advances in Data Analysis and Classification
language:en
Short-container-title:Adv Data Anal Classif

Author:

Szepannek Gero^ORCID,Aschenbruck Rabea,Wilhelm Adalbert

Abstract

AbstractOne of the most frequently used algorithms for clustering data with both numeric and categorical variables is the k-prototypes algorithm, an extension of the well-known k-means clustering. Gower’s distance denotes another popular approach for dealing with mixed-type data and is suitable not only for numeric and categorical but also for ordinal variables. In the paper a modification of the k-prototypes algorithm to Gower’s distance is proposed that ensures convergence. This provides a tool that allows to take into account ordinal information for clustering and can also be used for large data. A simulation study demonstrates convergence, good clustering results as well as small runtimes.

Funder

Hochschule Stralsund

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s11634-024-00595-5.pdf

Reference39 articles.

1. Ahmad A, Khan S (2018) Survey of state-of-the-art mixed data clustering algorithms. IEEE Access 7:31883–902

2. Aschenbruck R, Szepannek G (2020) Cluster validation for mixed-type data. Arch Data Sci Ser A 6(1):02. https://doi.org/10.5445/KSP/1000098011/02

3. Aschenbruck R, Szepannek G, Luenke K, Wilhelm A (2023) Heterogeneity in class: clustering student’s attitudes towards statistics. Stat Appl. https://doi.org/10.26398/IJAS.0034-008

4. Aschenbruck R, Szepannek G, Wilhelm A (2023) Random-based initialization strategies for clustering mixed-type data with the k-prototypes algorithm. In: Coretto P, Giordano G, La Rocca M, Parella M, Rampichini C (eds) CLADAG 2023 book of abstracts and short papers, vol 207. Pearson, pp 38–41

5. Aschenbruck R, Szepannek G, Wilhelm AFX (2022) Imputation strategies for clustering mixed-type data with missing values. J Classif. https://doi.org/10.1007/s00357-022-09422-y

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. clustMixType: k-Prototypes Clustering for Mixed Variable-Type Data;CRAN: Contributed Packages;2016-02-27