FDR2-BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems-Reference-Cited by-同舟云学术

FDR2-BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems

Published:2021-07-22 Issue:15 Volume:10 Page:1757
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Basgall María José^ORCID,Naiouf Marcelo^ORCID,Fernández Alberto^ORCID

Abstract

In this paper, a methodological data condensation approach for reducing tabular big datasets in classification problems is presented, named FDR2-BD. The key of our proposal is to analyze data in a dual way (vertical and horizontal), so as to provide a smart combination between feature selection to generate dense clusters of data and uniform sampling reduction to keep only a few representative samples from each problem area. Its main advantage is allowing the model’s predictive quality to be kept in a range determined by a user’s threshold. Its robustness is built on a hyper-parametrization process, in which all data are taken into consideration by following a k-fold procedure. Another significant capability is being fast and scalable by using fully optimized parallel operations provided by Apache Spark. An extensive experimental study is performed over 25 big datasets with different characteristics. In most cases, the obtained reduction percentages are above 95%, thus outperforming state-of-the-art solutions such as FCNN_MR that barely reach 70%. The most promising outcome is maintaining the representativeness of the original data information, with quality prediction values around 1% of the baseline.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/10/15/1757/pdf

Reference44 articles.

1. Data mining with big data

2. Industry 4.0: A survey on technologies, applications and open research issues

3. A Review of Data-Driven Decision-Making Methods for Industry 4.0 Maintenance Applications

4. Beyond the hype: Big data concepts, methods, and analytics

5. Access methods for Big Data: current status and future directions

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Learning Discriminative Features Using ANN-based Progressive Learning Model for Efficient Big Data Classification;Pertanika Journal of Science and Technology;2024-08-08

2. PUB-VEN: a personalized recommendation system for suggesting publication venues;Multimedia Tools and Applications;2023-10-14

3. Analysis and design of scalable pre-processing techniques of instances for imbalanced Big Data problems. Applications in humanitarian emergencies situations.;Journal of Computer Science and Technology;2022-10-17

4. Magnetic Force Classifier: A Novel Method for Big Data Classification;IEEE Access;2022

5. Intrusion Detection Model for Imbalanced Dataset Using SMOTE and Random Forest Algorithm;Communications in Computer and Information Science;2021