Investigating the Impact of Min-Max Data Normalization on the Regression Performance of K-Nearest Neighbor with Different Similarity Measurements-Reference-Cited by-同舟云学术

Investigating the Impact of Min-Max Data Normalization on the Regression Performance of K-Nearest Neighbor with Different Similarity Measurements

Published:2022-06-21 Issue:1 Volume:10 Page:85-91
ISSN:2307-549X
Container-title:ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY
language:
Short-container-title:ARO

Author:

Muhammad Ali Peshawa J.^ORCID

Abstract

K-nearest neighbor (KNN) is a lazy supervised learning algorithm, which depends on computing the similarity between the target and the closest neighbor(s). On the other hand, min-max normalization has been reported as a useful method for eliminating the impact of inconsistent ranges among attributes on the efficiency of some machine learning models. The impact of min-max normalization on the performance of KNN models is still not clear, and it needs more investigation. Therefore, this research examines the impacts of the min-max normalization method on the regression performance of KNN models utilizing eight different similarity measures, which are City block, Euclidean, Chebychev, Cosine, Correlation, Hamming, Jaccard, and Mahalanobis. Five benchmark datasets have been used to test the accuracy of the KNN models with the original dataset and the normalized dataset. Mean squared error (MSE) has been utilized as a performance indicator to compare the results. It’s been concluded that the impact of min-max normalization on the KNN models utilizing City block, Euclidean, Chebychev, Cosine, and Correlation depends on the nature of the dataset itself, therefore, testing models on both original and normalized datasets are recommended. The performance of KNN models utilizing Hamming, Jaccard, and Mahalanobis makes no difference by adopting min-max normalization because of their ratio nature, and dataset covariance involvement in the similarity calculations. Results showed that Mahalanobis outperformed the other seven similarity measures. This research is better than its peers in terms of reliability, and quality because it depended on testing different datasets from different application fields.

Publisher

Koya University

Subject

General Medicine

Cited by 16 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Topological clustering in investigating spatial patterns of particulate matter between air quality monitoring stations in malaysia;Air Quality, Atmosphere & Health;2024-06-19

2. Product Length Predictions with Machine Learning: An Integrated Approach Using Extreme Gradient Boosting;SN Computer Science;2024-06-18

3. Receive wireless sensor data through IoT gateway using web client based on border gateway protocol;Heliyon;2024-06

4. Investigating the Impact of Data Normalization Methods on Predicting Electricity Consumption in a Building Using different Artificial Neural Network Models.;Sustainable Cities and Society;2024-06

5. Enhancing Parkinson's Disease Diagnosis: A Stacking Ensemble Approach Leveraging Machine Learning Techniques;2024 4th International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET);2024-05-16