Empirical Evaluation of Map Reduce Based Hybrid Approach for Problem of Imbalanced Classification in Big Data-Reference-Cited by-同舟云学术

Empirical Evaluation of Map Reduce Based Hybrid Approach for Problem of Imbalanced Classification in Big Data

Published:2019-07 Issue:3 Volume:11 Page:23-45
ISSN:1938-0259
Container-title:International Journal of Grid and High Performance Computing
language:en
Short-container-title:

Author:

Ahlawat Khyati¹,Chug Anuradha²^ORCID,Singh Amit Prakash²^ORCID

Affiliation:

1. IGDTUW, Delhi, India

2. GGSIPU, Delhi, India

Abstract

Imbalanced datasets are the ones with uneven distribution of classes that deteriorates classifier's performance. In this paper, SVM classifier is combined with K-Means clustering approach and a hybrid approach, Hy_SVM_KM is introduced. The performance of proposed method is also empirically evaluated using Accuracy and FN Rate measure and compared with existing methods like SMOTE. The results have shown that the proposed hybrid technique has outperformed traditional machine learning classifier SVM in mostly datasets and have performed better than known pre-processing technique SMOTE for all datasets. The goal of this article is to extend capabilities of popular machine learning algorithms and adapt it to meet the challenges of imbalanced big data classification. This article can provide a baseline study for future research on imbalanced big datasets classification and provides an efficient mechanism to deal with imbalanced nature big dataset with modified SVM classifier and improves the overall performance of the model.

Publisher

IGI Global

Subject

Computer Networks and Communications

Reference44 articles.

1. Applying Support Vector Machines to Imbalanced Datasets

2. Efficient Machine Learning for Big Data: A Review

3. Class Imbalance Learning Methods for Support Vector Machines

4. A MapReduce solution for associative classification of big data

5. Improving execution speed of incremental runs of MapReduce using provenance

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Mobile Information System of English Teaching Ability Based on Big Data Fuzzy K-Means Clustering;Mobile Information Systems;2021-07-01

2. A Novel Hybrid Sampling Algorithm for Solving Class Imbalance Problem in Big Data;Advances in Data Science and Adaptive Analysis;2021-04

3. An Insight on the Class Imbalance Problem and Its Solutions in Big Data;Large-Scale Data Streaming, Processing, and Blockchain Security;2021

4. Virus de ácido ribonucleico (ARN) y coronavirus en Google Dataset Search: alcance y correlación epidemiológica;El profesional de la información;2020-12-21