Self-Boosted With Dynamic Semi-Supervised Clustering Method for Imbalanced Big Data Classification-Reference-Cited by-同舟云学术

Self-Boosted With Dynamic Semi-Supervised Clustering Method for Imbalanced Big Data Classification

Published:2022-05-06 Issue:1 Volume:10 Page:1-24
ISSN:2166-7160
Container-title:International Journal of Software Innovation
language:ng
Short-container-title:

Author:

Abhilasha Akkala¹,Annan Naidu P. ¹

Affiliation:

1. Centurion University of Technology and Management, India

Abstract

Big data plays a major role in the learning, manipulation, and forecasting of information intelligence. Due to the imbalance of data delivery, the learning and retrieval of information from such large datasets can result in limited classification outcomes and wrong decisions. Traditional machine learning classifiers successfully handling the imbalanced datasets still there is inadequacy in overfitting problems, training cost, and sample hardness in classification. In order to forecast a better classification, the research work proposed the novel “Self-Boosted with Dynamic Semi-Supervised Clustering Method”. The method is initially preprocessed by constructing sample blocks using Hybrid Associated Nearest Neighbor heuristic over-sampling to replicate the minority samples and merge each copy with every sub-set of majority samples to remove the overfitting issue thus slightly reduce noise with the imbalanced data. After preprocessing the data, massive data classification requires big data space which leads to large training costs.

Publisher

IGI Global

Subject

Artificial Intelligence,Computer Graphics and Computer-Aided Design,Computer Networks and Communications,Computer Science Applications,Software

Reference25 articles.

1. Basgall, M. J., Hasperué, W., Naiouf, M., Fernández, A., & Herrera, F. (2018). SMOTE-BD: An exact and scalable oversampling method for imbalanced classification in big data. VI Jornadas de Cloud Computing & Big Data (JCC&BD).

2. An Analysis of Local and Global Solutions to Address Big Data Imbalanced Classification: A Case Study with SMOTE Preprocessing

3. K-means Bayes algorithm for imbalanced fault classification and big data application

4. CHI-BD: A fuzzy rule-based classification system for Big Data classification problems

5. A Pareto-based Ensemble with Feature and Instance Selection for Learning from Multi-Class Imbalanced Datasets