Efficient Pre-Processing Techniques for Improving Classifiers Performance-Reference-Cited by-同舟云学术

Efficient Pre-Processing Techniques for Improving Classifiers Performance

Published:2021-12-30 Issue: Volume: Page:
ISSN:1544-5976
Container-title:Journal of Web Engineering
language:
Short-container-title:JWE

Author:

Nickolas S.,Shobha K.^ORCID

Abstract

Data pre-processing plays a vital role in the life cycle of data mining for accomplishing quality outcomes. In this paper, it is experimentally shown the importance of data pre-processing to achieve highly accurate classifier outcomes by imputing missing values using a novel imputation method, CLUSTPRO, by selecting highly correlated features using Correlation-based Variable Selection (CVS) and by handling imbalanced data using Synthetic Minority Over-sampling Technique (SMOTE). The proposed CLUSTPRO method makes use of Random Forest (RF) and Expectation Maximization (EM) algorithms to impute missing. The imputed results are evaluated using standard evaluation metrics. The CLUSTPRO imputation method outperforms existing, state-of-the-art imputation methods. The combined approach of imputation, feature selection, and imbalanced data handling techniques has significantly contributed to attaining an improved classification accuracy (AUC curve) of 40%–50% in comparison with results obtained without any pre-processing.

Publisher

River Publishers

Subject

Computer Networks and Communications,Information Systems,Software

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Empowering Healthcare With IoMT: Evolution, Machine Learning Integration, Security, and Interoperability Challenges;IEEE Access;2024

2. Comparison of Machine Learning Based on Category Theory;Journal of Web Engineering;2023-04-20

3. Product quality prediction based on RBF optimized by firefly algorithm;Journal of Systems Engineering and Electronics;2023

4. A Review on Machine Learning-Based WBCs Analysis in Blood Smear Images: Key Challenges, Datasets, and Future Directions;Studies in Big Data;2022