Clustering Validation Inference-Reference-Cited by-同舟云学术

Clustering Validation Inference

Published:2024-07-27 Issue:15 Volume:12 Page:2349
ISSN:2227-7390
Container-title:Mathematics
language:en
Short-container-title:Mathematics

Author:

Figuera Pau¹,Cuzzocrea Alfredo²,García Bringas Pablo¹^ORCID

Affiliation:

1. Faculty of Engineering, University of Deusto, 48007 Bilbao, Spain

2. iDEA Lab, University of Calabria, 87036 Rende, Italy

Abstract

Clustering validation is applied to evaluate the quality of classifications. This step is crucial for unsupervised machine learning. A plethora of methods exist for this purpose; however, a common drawback is that statistical inference is not possible. In this study, we construct a density function for the cluster number. For this purpose, we use smooth techniques. Then, we apply non-negative matrix factorization using the Kullback–Leibler divergence. Employing a unique linearly independent uncorrelated observational variable hypothesis, we construct a sequence by varying the dimension of the span space of the factorization only using analytical techniques. The expectation of the limit of this sequence follows a gamma probability density function. Then, identifying the dimension of the factorization of the space span with clusters, we transform the estimation of the suitable dimension of the factorization into a probabilistic estimate of the number of clusters. This approach is an internal validation method that is suitable for numerical and categorical multivariate data and independent of the clustering technique. Our main achievement is a predictive clustering validation model with graphical abilities. It provides results in terms of credibility, thus making it possible to compare results such as expert judgment on a quantitative basis.

Publisher

MDPI AG

Link

https://www.mdpi.com/2227-7390/12/15/2349/pdf

Reference75 articles.

1. MacQueen, J. (1965–7, January 27). Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA.

2. Data clustering: 50 years beyond K-means;Pattern Recognition Letters,2010

3. Aggarwal, C.C. (2014). Clustering: Algorithms and Applications, CRC Press Taylor and Francis Group.

4. A probabilistic theory of clustering;Dougherty;Pattern Recognit.,2004

5. Deng, H., and Han, J. (2018). Probabilistic models for clustering. Data Clustering, CRC.