Towards improving machine learning algorithms accuracy by benefiting from similarities between cases-Reference-Cited by-同舟云学术

Towards improving machine learning algorithms accuracy by benefiting from similarities between cases

Published:2021-01-04 Issue:1 Volume:40 Page:947-972
ISSN:1064-1246
Container-title:Journal of Intelligent & Fuzzy Systems
language:
Short-container-title:IFS

Author:

Mostafa Samih M.¹

Affiliation:

1. Faculty of Computers and Information, South Valley University, Qena, Egypt

Abstract

Data preprocessing is a necessary core in data mining. Preprocessing involves handling missing values, outlier and noise removal, data normalization, etc. The problem with existing methods which handle missing values is that they deal with the whole data ignoring the characteristics of the data (e.g., similarities and differences between cases). This paper focuses on handling the missing values using machine learning methods taking into account the characteristics of the data. The proposed preprocessing method clusters the data, then imputes the missing values in each cluster depending on the data belong to this cluster rather than the whole data. The author performed a comparative study of the proposed method and ten popular imputation methods namely mean, median, mode, KNN, IterativeImputer, IterativeSVD, Softimpute, Mice, Forimp, and Missforest. The experiments were done on four datasets with different number of clusters, sizes, and shapes. The empirical study showed better effectiveness from the point of view of imputation time, Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and coefficient of determination (R2 score) (i.e., the similarity of the original removed value to the imputed one).

Publisher

IOS Press

Subject

Artificial Intelligence,General Engineering,Statistics and Probability

Reference54 articles.

1. Roles of imputation methods for filling the missing values: A review;Norazian Ramli;Adv Environ Biol,2013

2. On the choice of the best imputation methods for missing values considering three groups of classification methods;Luengo;Knowl Inf Syst,2012

3. Imputation of missing data with neural networks for classification

4. Similarity-learning information-fusion schemes for missing data imputation;Razavi-Far;Knowledge-Based Syst,2020

5. Classifiers Accuracy Improvement Based on Missing Data Imputation;Jordanov;J Artif Intell Soft Comput Res,2018

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Predicting the efficiency of arsenic immobilization in soils by biochar using machine learning;Journal of Environmental Sciences;2025-01

2. High resolution photovoltaic power generation potential assessments of rooftop in China;Energy Reports;2022-11

3. An LVQ clustering algorithm based on neighborhood granules;Journal of Intelligent & Fuzzy Systems;2022-09-22

4. Tennis Video Target Tracking Based on Mobile Network Communication and Machine Learning Algorithm;International Transactions on Electrical Energy Systems;2022-09-10

5. Special Physical Characteristic and Training Strategy of Badminton Based on Machine Learning Algorithm;International Transactions on Electrical Energy Systems;2022-09-06