Abstract
AbstractClustering techniques are convenient tools for preparing and organizing unstructured and unclassified data. Depending on the data, they can be used to prepare for an analysis or to gain insight. However, choosing a clustering technique can be challenging when dealing with high-dimensional datasets. Most often, application requirements and data distribution need to be considered. Since clustering is defined as a complex problem to calculate, different algorithms may produce different results that meet the application's needs. This study presents an automated threshold-based and goal-oriented clustering procedure. It is based on the AutoML mechanism to estimate the most suitable hyperparameters according to predefined needs and can learn four clustering performance metrics thresholds for a given dataset. The significant advantages of this method are the automatic selection of clustering technique (i.e., partitional, hierarchical, density-based, or graph-based) and the ability to determine the output dynamically, according to predefined goals. We tested our method over four datasets and analyzed the results according to different goals. The results show that our method improved the silhouette score by 549.5% (from 0.105 to 0.682) compared to popular and commonly used K-means. Furthermore, clustering based on multiple metrics yielded more information than clustering by a single metric.
Publisher
Springer Nature Singapore
Reference54 articles.
1. Barlow HB (1989) Unsupervised learning. Neural Comput 1(3):295–311. https://doi.org/10.1162/neco.1989.1.3.295
2. Sindhu Meena K, Suriya S (2019) A survey on supervised and unsupervised learning techniques. In: International conference on artificial intelligence, smart grid and smart city applications. Springer, Cham, pp 627–644. https://doi.org/10.1007/978-3-030-24051-6_58
3. Berry MW, Mohamed A, Yap BW (eds) (2019) Supervised and unsupervised learning for data science. Springer Nature, Cham
4. Elavarasi SA, Akilandeswari J, Sathiyabhama B (2011) A survey on partition clustering algorithms. Int J Enterp Comput Bus Syst 1(1):1–14
5. Patel S, Sihmar S, Jatain A (2015) A study of hierarchical clustering algorithms. In: 2015 2nd International conference on computing for sustainable global development (INDIACom). IEEE, New Delhi, pp 537–541