Imbalanced data classification using improved synthetic minority over-sampling technique-Reference-Cited by-同舟云学术

Imbalanced data classification using improved synthetic minority over-sampling technique

Published:2023-10-06 Issue:2 Volume:19 Page:117-131
ISSN:1875-9076
Container-title:Multiagent and Grid Systems
language:
Short-container-title:MGS

Author:

Anusha Yamijala¹,Visalakshi R.²,Srinivas Konda³

Affiliation:

1. Department of Computer Science and Engineering, Annamalai University, Chidambaram, India

2. Department of Information Technology, Annamalai University, Chidambaram, India

3. Department of Computer Science and Engineering (Data Science), CMR Technical Campus, Hyderabad, India

Abstract

In data mining, deep learning and machine learning models face class imbalance problems, which result in a lower detection rate for minority class samples. An improved Synthetic Minority Over-sampling Technique (SMOTE) is introduced for effective imbalanced data classification. After collecting the raw data from PIMA, Yeast, E.coli, and Breast cancer Wisconsin databases, the pre-processing is performed using min-max normalization, cleaning, integration, and data transformation techniques to achieve data with better uniqueness, consistency, completeness and validity. An improved SMOTE algorithm is applied to the pre-processed data for proper data distribution, and then the properly distributed data is fed to the machine learning classifiers: Support Vector Machine (SVM), Random Forest, and Decision Tree for data classification. Experimental examination confirmed that the improved SMOTE algorithm with random forest attained significant classification results with Area under Curve (AUC) of 94.30%, 91%, 96.40%, and 99.40% on the PIMA, Yeast, E.coli, and Breast cancer Wisconsin databases.

Publisher

IOS Press

Subject

General Computer Science

Reference42 articles.

1. Scalable semisupervised GMM for big data quality prediction in multimode processes;Yao;IEEE Transactions on Industrial Electronics,2019

2. Distributed parallel deep learning of hierarchical extreme learning machine for multimode quality prediction with big process data;Yao;Engineering Applications of Artificial Intelligence,2019

3. A new Internet of Things architecture for real-time prediction of various diseases using machine learning on big data environment;Ed-daoudy;Journal of Big Data,2019

4. Big data based hybrid machine learning model for improving performance of medical Internet of Things data in healthcare systems

5. An imbalanced big data mining framework for improving optimization algorithms performance;Hassib;IEEE Access,2019

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Classification of imbalanced datasets utilizing the synthetic minority oversampling method in conjunction with several machine learning techniques;Iran Journal of Computer Science;2024-09-11