Imputation Strategies for Clustering Mixed-Type Data with Missing Values-Reference-Cited by-同舟云学术

Imputation Strategies for Clustering Mixed-Type Data with Missing Values

Published:2022-11-26 Issue:1 Volume:40 Page:2-24
ISSN:0176-4268
Container-title:Journal of Classification
language:en
Short-container-title:J Classif

Author:

Aschenbruck Rabea^ORCID,Szepannek Gero^ORCID,Wilhelm Adalbert F. X.^ORCID

Abstract

Abstract Incomplete data sets with different data types are difficult to handle, but regularly to be found in practical clustering tasks. Therefore in this paper, two procedures for clustering mixed-type data with missing values are derived and analyzed in a simulation study with respect to the factors of partition, prototypes, imputed values, and cluster assignment. Both approaches are based on the k-prototypes algorithm (an extension of k-means), which is one of the most common clustering methods for mixed-type data (i.e., numerical and categorical variables). For k-means clustering of incomplete data, the k-POD algorithm recently has been proposed, which imputes the missings with values of the associated cluster center. We derive an adaptation of the latter and additionally present a cluster aggregation strategy after multiple imputation. It turns out that even a simplified and time-saving variant of the presented method can compete with multiple imputation and subsequent pooling.

Funder

Hochschule Stralsund

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Statistics, Probability and Uncertainty,Psychology (miscellaneous),Mathematics (miscellaneous)

Link

https://link.springer.com/content/pdf/10.1007/s00357-022-09422-y.pdf

Reference41 articles.

1. Agresti, A. (2007). An introduction to categorical data analysis, 2nd edn. New York: Wiley. https://doi.org/10.1002/0470114754.

2. Ahmad, A., & Khan, S. (2019). Survey of state-of-the-art mixed data clustering algorithms. IEEE Access, 31883–31902. https://doi.org/10.1109/ACCESS.2019.2903568.

3. Aschenbruck, R., & Szepannek, G. (2020). Cluster validation for mixed-type data. Archives of Data Science Series A, 6(1), 1–12. https://doi.org/10.5445/KSP/1000098011/02.