Affiliation:
1. Department of Computer Science and Engineering, Annamalai University, Chidambaram, India
2. Department of Information Technology, Annamalai University, Chidambaram, India
3. Department of Computer Science and Engineering (Data Science), CMR Technical Campus, Hyderabad, India
Abstract
In data mining, deep learning and machine learning models face class imbalance problems, which result in a lower detection rate for minority class samples. An improved Synthetic Minority Over-sampling Technique (SMOTE) is introduced for effective imbalanced data classification. After collecting the raw data from PIMA, Yeast, E.coli, and Breast cancer Wisconsin databases, the pre-processing is performed using min-max normalization, cleaning, integration, and data transformation techniques to achieve data with better uniqueness, consistency, completeness and validity. An improved SMOTE algorithm is applied to the pre-processed data for proper data distribution, and then the properly distributed data is fed to the machine learning classifiers: Support Vector Machine (SVM), Random Forest, and Decision Tree for data classification. Experimental examination confirmed that the improved SMOTE algorithm with random forest attained significant classification results with Area under Curve (AUC) of 94.30%, 91%, 96.40%, and 99.40% on the PIMA, Yeast, E.coli, and Breast cancer Wisconsin databases.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献