Preprocessing kNN algorithm classification using K-means and distance matrix with students’ academic performance dataset-Reference-Cited by-同舟云学术

Preprocessing kNN algorithm classification using K-means and distance matrix with students’ academic performance dataset

Published:2020-10-21 Issue:4 Volume:8 Page:
ISSN:2338-0403
Container-title:Jurnal Teknologi dan Sistem Komputer
language:
Short-container-title:Jurnal Teknologi dan Sistem Komputer

Author:

Sugriyono Sugriyono¹,Siregar Maria Ulfah²^ORCID

Affiliation:

1. Master of Informatics Department, Sunan Kalijaga Islamic State University

2. Master of Informatics Department, Sunan Kalijaga Islamic State University Yogyakarta, Indonesia

Abstract

The existence of outliers in the dataset can cause low accuracy in a classification process. Outliers in the dataset can be removed from a preprocessing stage of classification algorithms. Clustering can be used as an outlier detection method. This study applies K-means and a distance matrix to detect outliers and remove them from datasets with class labels. This research used a dataset of students’ academic performance totaling 6847 instances, having 18 attributes and 3 class labels. Preprocessing applies the K-means method to get centroid in each class. The distance matrix is used to evaluate the distance of instance to the centroid. Outliers, which are a different class, will be removed from the dataset. This preprocessing improves the classification accuracy of the kNN algorithm. Data without preprocessing has 72.28 % accuracy, preprocessed data using K-means with Euclidean has 98.42 % accuracy (an increase of 26.14 %), while the K-means with Manhattan has 97.76 % accuracy (an increase of 25.48 %).

Funder

UIN Sunan Kalijaga, Yogyakarta, Indonesia

Publisher

Institute of Research and Community Services Diponegoro University (LPPM UNDIP)

Subject

General Earth and Planetary Sciences,General Environmental Science

Reference14 articles.

1. STRATEGI PENGEMBANGAN MUTU PERGURUAN TINGGI

2. Pola Perwalian Sebagai Pembinaan Akademik, Kerohanian dan Karakter Mahasiswa

3. Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method

4. kNN Classification with an Outlier Informative Distance Measure

5. Performance of distance-based k-nearest neighbor classification method using local mean vector and harmonic distance

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Classification of beneficiaries for the rehabilitation of uninhabitable houses using the K-Nearest Neighbor algorithm;Jurnal Teknologi dan Sistem Komputer;2022-01-20