Evaluation of Data Clustering Accuracy using K-Means Algorithm-Reference-Cited by-同舟云学术

Evaluation of Data Clustering Accuracy using K-Means Algorithm

Published:2023-12-21 Issue:01 Volume:2 Page:385-396
ISSN:2987-226X
Container-title:International Journal of Multidisciplinary Approach Research and Science
language:
Short-container-title:Int. J. Multidiscipline. Approach. Res. Sci.

Author:

Suraya Suraya,Sholeh Muhammad,Lestari Uning

Abstract

Data clustering is one of the methods in data science that is often used in data analysis. This method is used in making groupings from a collection of datasheets. Data clustering is done to find patterns or relationships between data. This research aims to evaluate the accuracy of data clustering using K-Means algorithm on wine datasheet. Wine datasheet has 13 features that describe the chemical characteristics of three types of wine. The clustering process must produce the best clustering evaluation metrics. The evaluation metric is done through comparison between the clustering results of K-Means algorithm with Davies Bouldin and Silhouette. The research steps involved data standardization, selection of the optimal number of clusters, and assessment of clustering accuracy. The research method uses KDD which consists of pre-processing, transformation, model building and model evaluation. Experimental results show that appropriate parameters and cluster initialization can improve clustering evaluation metrics. The clustering results show that the normalized datasheet produces evaluation metrics for Davies Bouldin 2 groups and Silhouette produces 3 groups. Before normalization, Davies Boulidin results in 7 groups and Silhouette results in 2 groups. In conclusion, this study produced different evaluation metrics between normalized and non-normalized datasheets. The selection of the number of groups chosen depends on the context of the data analysis performed and is selected into 3 groups which can be labelled "Superior Variety", the second group "Intermediate Variety" and the third group "Standard Variety".

Publisher

PT. Riset Press International

Reference24 articles.

1. Amanda, & Veronica Sitorus, M. (2021). Penerapan Algoritma K-Means Clustering Untuk Pengelompokan Konsumsi Produk Kosmetik milik PT Cedefindo. Jurnal Ilmiah MIKA AMIK Al Muslim, V(2), 63–68.

2. Asmiatun, S., Wakhidah, N., Putri, A. N., & Kunci, K. (2019). Identifikasi Kondisi Permukaan Jalan Menggunakan K-Means Clustering Road Surface Conditions Identification Using K-Means Clustering. November 2019, 23–30.

3. Awaludin, M. (2014). Penerapan Algoritma K-Means Clustering Pada K-Harmonic Means Untuk Schedule Preventive Maintenance Service. Jurnal Sistem Informasi Universitas Suryadarma, 6(1), 1–17. https://doi.org/10.35968/jsi.v6i1.271

4. Cielen, D., Meysman, A. D. B., & Ali, M. (2016). Introducing Data Science: Big Data, Machine Learning, and more, using Python tools - PDFDrive.com. Manning Publications.

5. Deny Jollyta , Muhammad Siddik , Herman Mawengkang, S. E. (2021). Teknik Evaluasi Cluster Solusi Menggunakan Python Dan Rapidminer. Deepublish Publisher.