Author:
Camattari Fabiana,Guastavino Sabrina,Marchetti Francesco,Piana Michele,Perracchione Emma
Abstract
AbstractThe purpose of this study is to introduce a new approach to feature ranking for classification tasks, called in what follows greedy feature selection. In statistical learning, feature selection is usually realized by means of methods that are independent of the classifier applied to perform the prediction using that reduced number of features. Instead, the greedy feature selection identifies the most important feature at each step and according to the selected classifier. The benefits of such scheme are investigated in terms of model capacity indicators, such as the Vapnik-Chervonenkis dimension or the kernel alignment. This theoretical study proves that the iterative greedy algorithm is able to construct classifiers whose complexity capacity grows at each step. The proposed method is then tested numerically on various datasets and compared to the state-of-the-art techniques. The results show that our iterative scheme is able to truly capture only a few relevant features, and may improve, especially for real and noisy data, the accuracy scores of other techniques. The greedy scheme is also applied to the challenging application of predicting geo-effective manifestations of the active Sun.
Funder
Università degli Studi di Genova
Publisher
Springer Science and Business Media LLC
Reference51 articles.
1. Bajer, D., Dudjak, M., Zorić, B.: Wrapper-based feature selection: how important is the wrapped classifier? In: 2020 International Conference on Smart Systems and Technologies (SST), pp. 97–105 (2020). IEEE
2. Bartlett, P.L., Mendelson, S.: Rademacher and gaussian complexities: risk bounds and structural results. J. Mach. Learn. Res. 3, 463–482 (2002)
3. Bloomfield, D.S., Higgins, P.A., McAteer, R.T.J., Gallagher, P.T.: Toward reliable benchmarking of solar flare forecasting methods. The Astrophys. J. Letters 747(2), 41 (2012)
4. Bobra, M.G., Couvidat, S.: Solar flare prediction using sdo/hmi vector magnetic field data with a machine-learning algorithm. Astrophys J. 798(2), 135 (2015)
5. Bommert, A., Sun, X., Bischl, B., Rahnenführer, J., Lang, M.: Benchmark for filter methods for feature selection in high-dimensional classification data. Comput. Stat. & Data Anal. 143, 106839 (2020)