Abstract
Large datasets with noisy labels and high dimensions have become increasingly prevalent in industry. These datasets often contain errors or inconsistencies in the assigned labels and introduce a vast number of predictive variables. Such issues frequently arise in real-world scenarios due to uncertainties or human errors during data collection and annotation processes. The presence of noisy labels and high dimensions can significantly impair the generalization ability and accuracy of trained models. To address the above issues, we introduce a simple-structured penalized γ-divergence model and a novel meta-gradient correction algorithm and establish the foundations of these two modules based on rigorous theoretical proofs. Finally, comprehensive experiments are conducted to validate their effectiveness in detecting noisy labels and mitigating the curse of dimensionality and suggest that our proposed model and algorithm can achieve promising outcomes. Moreover, we open-source our codes and distinctive datasets on GitHub (refer to https://github.com/DebtVC2022/Robust_Learning_with_MGC).
Publisher
Public Library of Science (PLoS)
Reference63 articles.
1. Longremix: Robust learning with high confidence samples in a noisy label environment;F Cordeiro;Pattern Recognition,2023
2. FleBiC: Learning classifiers from high-dimensional biomedical data using discriminative biclusters with non-constant patterns;R Henriques;Pattern Recognition,2021
3. A two-stage hybrid ant colony optimization for high-dimensional feature selection;W Ma;Pattern Recognition,2021
4. Ma X, Huang H, Wang Y, Romano S, Erfani S, Bailey J. Normalized loss functions for deep learning with noisy labels. In: International conference on machine learning. PMLR; 2020. p. 6543–6553.
5. Confident learning: Estimating uncertainty in dataset labels;C Northcutt;Journal of Artificial Intelligence Research,2021