Feature ranking for semi-supervised learning-Reference-Cited by-同舟云学术

Feature ranking for semi-supervised learning

Published:2022-06-22 Issue: Volume: Page:
ISSN:0885-6125
Container-title:Machine Learning
language:en
Short-container-title:Mach Learn

Author:

Petković Matej,Džeroski Sašo,Kocev Dragi^ORCID

Abstract

AbstractThe data used for analysis are becoming increasingly complex along several directions: high dimensionality, number of examples and availability of labels for the examples. This poses a variety of challenges for the existing machine learning methods, related to analyzing datasets with a large number of examples that are described in a high-dimensional space, where not all examples have labels provided. For example, when investigating the toxicity of chemical compounds, there are many compounds available that can be described with information-rich high-dimensional representations, but not all of the compounds have information on their toxicity. To address these challenges, we propose methods for semi-supervised learning (SSL) of feature rankings. The feature rankings are learned in the context of classification and regression, as well as in the context of structured output prediction (multi-label classification, MLC, hierarchical multi-label classification, HMLC and multi-target regression, MTR) tasks. This is the first work that treats the task of feature ranking uniformly across various tasks of semi-supervised structured output prediction. To the best of our knowledge, it is also the first work on SSL of feature rankings for the tasks of HMLC and MTR. More specifically, we propose two approaches—based on predictive clustering tree ensembles and the Relief family of algorithms—and evaluate their performance across 38 benchmark datasets. The extensive evaluation reveals that rankings based on Random Forest ensembles perform the best for classification tasks (incl. MLC and HMLC tasks) and are the fastest for all tasks, while ensembles based on extremely randomized trees work best for the regression tasks. Semi-supervised feature rankings outperform their supervised counterparts across the majority of datasets for all of the different tasks, showing the benefit of using unlabeled in addition to labeled data.

Funder

Horizon 2020 Framework Programme

Javna Agencija za Raziskovalno Dejavnost RS

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Software

Link

https://link.springer.com/content/pdf/10.1007/s10994-022-06181-0.pdf

Reference69 articles.

1. Alalga, A., Benabdeslem, K., & Taleb, N. (2016). Soft-constrained Laplacian score for semi-supervised multi-label feature selection. Knowledge and Information Systems, 47(1), 75–98.

2. Arthur, D., & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, SODA ’07 (pp. 1027–1035), USA. Society for Industrial and Applied Mathematics.

3. Bellal, F., Elghazel, H., & Aussem, A. (2012). A semi-supervised feature ranking method with ensemble learning. Pattern Recognition Letters, 33(10), 1426–1433.

4. Bhardwaj, K., & Patra, S. (2018). An unsupervised technique for optimal feature selection in attribute profiles for spectral-spatial classification of hyperspectral images. ISPRS Journal of Photogrammetry and Remote Sensing, 138, 139–150.

5. Bilken University. (2020). Function approximation repository. Accessible at http://funapp.cs.bilkent.edu.tr/DataSets/.

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Advancing toxicity studies of per- and poly-fluoroalkyl substances (pfass) through machine learning: Models, mechanisms, and future directions;Science of The Total Environment;2024-10

2. An Efficient Anomaly Detection Method for Industrial Control Systems: Deep Convolutional Autoencoding Transformer Network;International Journal of Intelligent Systems;2024-05-29

3. Semi-Supervised Predictive Clustering Trees for (Hierarchical) Multi-Label Classification;International Journal of Intelligent Systems;2024-04-13

4. Efficient Feature Ranking and Selection Using Statistical Moments;IEEE Access;2024

5. CLUSplus: A decision tree-based framework for predicting structured outputs;SoftwareX;2023-12