An oversampling algorithm of multi-label data based on cluster-specific samples and fuzzy rough set theory-Reference-Cited by-同舟云学术

An oversampling algorithm of multi-label data based on cluster-specific samples and fuzzy rough set theory

Published:2024-06-06 Issue:5 Volume:10 Page:6267-6282
ISSN:2199-4536
Container-title:Complex & Intelligent Systems
language:en
Short-container-title:Complex Intell. Syst.

Author:

Liu Jinming,Huang Kai^ORCID,Chen Chen,Mao Jian

Abstract

AbstractImbalanced class distributions are common in real-world scenarios, including datasets with multiple labels. One widely acknowledged approach to addressing imbalanced distributions is through oversampling, a technique that both balances the class distribution and improves the effectiveness of classification models. However, when generating synthetic data for multi-label datasets, complexities arise due to the presence of multiple-label sets, which require careful placement and labeling. We propose MLCSMOTE-FRST, an algorithm for synthetic data generation based on label-specific clustering and fuzzy rough set theory. Generation ratios and dependency samples are provided by clusters specific to each label, with a focus on the overall label distribution and the distribution within each cluster. The labels are supported by intra-cluster positive samples, determined using fuzzy rough set theory, which helps to capture the consensus label set. Experimental results on multi-label datasets using four classifiers demonstrate the effectiveness of the proposed method in terms of macro-F1 and micro-F1 scores.

Funder

Natural Science Foundation of Xiamen Municipality

Natural Science Foundation of Fujian Province

Department of Education, Fujian Province

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s40747-024-01498-w.pdf

Reference59 articles.

1. Alcalá-Fdez J, Sanchez L, Garcia S et al (2009) Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13:307–318

2. Camacho L, Douzas G, Bacao F (2022) Geometric smote for regression. Expert Syst Appl 193:116387

3. Charte F, Rivera A, del Jesus MJ et al (2013) A first approach to deal with imbalance in multi-label datasets. In: Hybrid artificial intelligent systems: 8th international conference, HAIS 2013, Salamanca, Spain, September 11–13, 2013. Proceedings 8, Springer, pp 150–160

4. Charte F, Rivera AJ, del Jesus MJ et al (2014) Mlenn: a first approach to heuristic multilabel undersampling. In: Intelligent data engineering and automated learning—IDEAL 2014: 15th international conference, Salamanca, Spain, September 10–12, 2014. Proceedings 15, Springer, pp 1–9

5. Charte F, Rivera AJ, del Jesus MJ et al (2015) Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing 163:3–16

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MLAWSMOTE: Oversampling in Imbalanced Multi-label Classification with Missing Labels by Learning Label Correlation Matrix;International Journal of Computational Intelligence Systems;2024-08-05