A Density-Based Random Forest for Imbalanced Data Classification-Reference-Cited by-同舟云学术

A Density-Based Random Forest for Imbalanced Data Classification

Published:2022-03-14 Issue:3 Volume:14 Page:90
ISSN:1999-5903
Container-title:Future Internet
language:en
Short-container-title:Future Internet

Author:

Dong Jia,Qian Quan

Abstract

Many machine learning problem domains, such as the detection of fraud, spam, outliers, and anomalies, tend to involve inherently imbalanced class distributions of samples. However, most classification algorithms assume equivalent sample sizes for each class. Therefore, imbalanced classification datasets pose a significant challenge in prediction modeling. Herein, we propose a density-based random forest algorithm (DBRF) to improve the prediction performance, especially for minority classes. DBRF is designed to recognize boundary samples as the most difficult to classify and then use a density-based method to augment them. Subsequently, two different random forest classifiers were constructed to model the augmented boundary samples and the original dataset dependently, and the final output was determined using a bagging technique. A real-world material classification dataset and 33 open public imbalanced datasets were used to evaluate the performance of DBRF. On the 34 datasets, DBRF could achieve improvements of 2–15% over random forest in terms of the F1-measure and G-mean. The experimental results proved the ability of DBRF to solve the problem of classifying objects located on the class boundary, including objects of minority classes, by taking into account the density of objects in space.

Funder

National Key Research and Development Program of China

Key Program of Science and Technology of Yunnan Province

Publisher

MDPI AG

Subject

Computer Networks and Communications

Link

https://www.mdpi.com/1999-5903/14/3/90/pdf

Reference40 articles.

1. On the application of multi-class classification in physical therapy recommendation

2. Enhancing instance-based classification with local density: a new algorithm for classifying unbalanced biomedical data

Cited by 16 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Predicting Yield Strength and Plastic Elongation in Body-Centered Cubic High-Entropy Alloys;Materials;2024-09-08

2. A framework of Polar CanisFel optimization-based deep ensemble classifier with graph embedding for imbalanced data classification;Web Intelligence;2024-08-02

3. Machine learning techniques for diagrid building design: Architectural–Structural correlations with feature selection and data augmentation;Journal of Building Engineering;2024-06

4. Imbalanced Data Classification Using Oversampling and Automatic Feature Selection Methods for Undergraduate Student Career Prediction;2024 13th International Conference on Educational and Information Technology (ICEIT);2024-03-22

5. Predicting COVID-19 Outbreaks: Leveraging Machine Learning and Deep Learning Models for Trend Analysis;Lecture Notes in Networks and Systems;2024