Affiliation:
1. School of Mathematics and Physics, Hebei University of Engineering, Handan 056038, China
2. Laboratory for Space Environment and Physical Sciences, Harbin Institute of Technology, Harbin 150001, China
Abstract
Imbalanced class data are commonly observed in pattern analysis, machine learning, and various real-world applications. Conventional approaches often resort to resampling techniques in order to address the imbalance, which inevitably alter the original data distribution. This paper proposes a novel classification method that leverages optimal transport for handling imbalanced data. Specifically, we establish a transport plan between training and testing data without modifying the original data distribution, drawing upon the principles of optimal transport theory. Additionally, we introduce a non-convex interclass regularization term to establish connections between testing samples and training samples with the same class labels. This regularization term forms the basis of a regularized discrete optimal transport model, which is employed to address imbalanced classification scenarios. Subsequently, in line with the concept of maximum minimization, a maximum minimization algorithm is introduced for regularized discrete optimal transport. Subsequent experiments on 17 Keel datasets with varying levels of imbalance demonstrate the superior performance of the proposed approach compared to 11 other widely used techniques for class-imbalanced classification. Additionally, the application of the proposed approach to water quality evaluation confirms its effectiveness.
Funder
National Science Foundation of China
Hebei Natural Science Foundation
Natural Science Foundation of Scientific Research Project of Higher Education in Hebei Province
333 Talent Supported Project of Hebei Province
Reference39 articles.
1. A discriminative representation of convolutional features for indoor scene recognition;Khan;IEEE Trans. Image Process.,2016
2. Possibility measure based fuzzy support function machine for set based fuzzy classifications;Chen;Inf. Sci.,2019
3. Feature selection approach based on improved fuzzy C-means with principle of refined justifiable granularity;Li;IEEE Trans. Fuzzy Syst.,2023
4. Interval dominance-based feature selection for interval-valued ordered data;Li;IEEE Trans. Neural Netw. Learn. Syst.,2022
5. Li, W., Deng, C., Pedrycz, W., Castillo, O., Zhang, C., and Zhan, T. (2023). Double-quantitative feature selection approach for multi-granularity ordered decision systems. IEEE Trans. Artif. Intell., 1–12.