A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning-Reference-Cited by-同舟云学术

A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning

Published:2023-02-20 Issue:4 Volume:23 Page:2333
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Szeghalmy Szilvia¹^ORCID,Fazekas Attila¹^ORCID

Affiliation:

1. Faculty of Informatics, University of Debrecen, H-4028 Debrecen, Hungary

Abstract

Nowadays, the solution to many practical problems relies on machine learning tools. However, compiling the appropriate training data set for real-world classification problems is challenging because collecting the right amount of data for each class is often difficult or even impossible. In such cases, we can easily face the problem of imbalanced learning. There are many methods in the literature for solving the imbalanced learning problem, so it has become a serious question how to compare the performance of the imbalanced learning methods. Inadequate validation techniques can provide misleading results (e.g., due to data shift), which leads to the development of methods designed for imbalanced data sets, such as stratified cross-validation (SCV) and distribution optimally balanced SCV (DOB-SCV). Previous studies have shown that higher classification performance scores (AUC) can be achieved on imbalanced data sets using DOB-SCV instead of SCV. We investigated the effect of the oversamplers on this difference. The study was conducted on 420 data sets, involving several sampling methods and the DTree, kNN, SVM, and MLP classifiers. We point out that DOB-SCV often provides a little higher F1 and AUC values for classification combined with sampling. However, the results also prove that the selection of the sampler–classifier pair is more important for the classification performance than the choice between the DOB-SCV and the SCV techniques.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/23/4/2333/pdf

Reference57 articles.

1. An efficient fraud detection framework with credit card imbalanced data in financial services;Hemdan;Multimed. Tools Appl.,2023

2. Credit card fraud detection under extreme imbalanced data: A comparative study of data-level algorithms;Singh;J. Exp. Theor. Artif. Intell.,2022

3. A comprehensive data-level investigation of cancer diagnosis on imbalanced data;Gupta;Comput. Intell.,2022

4. A study of data pre-processing techniques for imbalanced biomedical data classification;Liu;Int. J. Bioinform. Res. Appl.,2020

5. A minority oversampling approach for fault detection with heterogeneous imbalanced data;Liu;Expert Syst. Appl.,2021

Cited by 39 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Comprehensive analysis of multiple classifiers for enhanced river water quality monitoring with explainable AI;Case Studies in Chemical and Environmental Engineering;2024-12

2. Phenotyping for heat stress tolerance in wheat population using physiological traits, multispectral imagery, and machine learning approaches;Plant Stress;2024-09

3. New insights into the metallogenic genesis of the Xiadian Au deposit, Jiaodong Peninsula, Eastern China: Constraints from integrated rutile in-situ geochemical analysis and machine learning discrimination;Ore Geology Reviews;2024-08

4. Using artificial neural networks and citizen science data to assess jellyfish presence along coastal areas;Journal of Applied Ecology;2024-07-24

5. Detection of fusarium wilt-induced physiological impairment in strawberry plants using hyperspectral imaging and machine learning;Precision Agriculture;2024-07-24