A Brief Comparison of K-means and Agglomerative Hierarchical Clustering Algorithms on Small Datasets-Reference-Cited by-同舟云学术

A Brief Comparison of K-means and Agglomerative Hierarchical Clustering Algorithms on Small Datasets

Published:2022 Issue: Volume: Page:623-632
ISSN:1876-1100
Container-title:Proceeding of 2021 International Conference on Wireless Communications, Networking and Applications
language:
Short-container-title:

Author:

Abdalla Hassan I.

Abstract

AbstractIn this work, the agglomerative hierarchical clustering and K-means clustering algorithms are implemented on small datasets. Considering that the selection of the similarity measure is a vital factor in data clustering, two measures are used in this study - cosine similarity measure and Euclidean distance - along with two evaluation metrics - entropy and purity - to assess the clustering quality. The datasets used in this work are taken from UCI machine learning depository. The experimental results indicate that k-means clustering outperformed hierarchical clustering in terms of entropy and purity using cosine similarity measure. However, hierarchical clustering outperformed k-means clustering using Euclidean distance. It is noted that performance of clustering algorithm is highly dependent on the similarity measure. Moreover, as the number of clusters gets reasonably increased, the clustering algorithms’ performance gets higher.

Publisher

Springer Nature Singapore

Link

https://link.springer.com/content/pdf/10.1007/978-981-19-2456-9_64

Reference13 articles.

1. Amer, A.A.: On K-means clustering-based approach for DDBSs design. J. Big Data 7(1), 1–31 (2020). https://doi.org/10.1186/s40537-020-00306-9

2. Amer, A., Mohamed, M., Al_Asri, K.: ASGOP: an aggregated similarity-based greedy-oriented approach for relational DDBSs design. Heliyon 6(1), e03172 (2020)

3. Amer, A., Abdalla, H., Nguyen, L.: Enhancing recommendation systems performance using highly-effective similarity measures. Knowl.-Based Syst. 217, 106842 (2021)

4. Amer, A.A., Abdalla, H.I.: A set theory based similarity measure for text clustering and classification. J. Big Data 7(1), 1–43 (2020). https://doi.org/10.1186/s40537-020-00344-3

5. Lee, C., Hung, C., Lee, S.: A comparative study on clustering algorithms. In: 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, Honolulu, HI, pp. 557–562 (2013)

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Assessment of Pepper Robot’s Speech Recognition System through the Lens of Machine Learning;Biomimetics;2024-06-27

2. Unbiased Metabolomics of Volatile Secondary Metabolites in Essential Oils Originated from Myrtaceae Species;Chemistry Africa;2024-06-06

3. A Comparative Analysis between K-Means and Agglomerative Clustering Techniques in Maritime Skill Certification;Compiler;2024-05-31

4. Tropical tropospheric aerosol sources and chemical composition observed at high altitude in the Bolivian Andes;Atmospheric Chemistry and Physics;2024-03-05

5. Exploring stingless bee honey from selected regions of Peninsular Malaysia through gas chromatography–mass spectrometry–based untargeted metabolomics;Journal of Food Science;2024-01-14