Abstract
The curse of dimensionality problem occurs when the data are high-dimensional. It affects the learning process and reduces the accuracy. Feature selection is one of the dimensionality reduction approaches that mainly contribute to solving the curse of the dimensionality problem by selecting the relevant features. Irrelevant features are the dependent and redundant features that cause noise in the data and then reduce its quality. The main well-known feature-selection methods are wrapper and filter techniques. However, wrapper feature selection techniques are computationally expensive, whereas filter feature selection methods suffer from multicollinearity. In this research study, four new feature selection methods based on outlier detection using the Projection Pursuit method are proposed. Outlier detection involves identifying abnormal data (irrelevant features of the transpose matrix obtained from the original dataset matrix). The concept of outlier detection using projection pursuit has proved its efficiency in many applications but has not yet been used as a feature selection approach. To the author’s knowledge, this study is the first of its kind. Experimental results on nineteen real datasets using three classifiers (k-NN, SVM, and Random Forest) indicated that the suggested methods enhanced the classification accuracy rate by an average of 6.64% when compared to the classification accuracy without applying feature selection. It also outperformed the state-of-the-art methods on most of the used datasets with an improvement rate ranging between 0.76% and 30.64%. Statistical analysis showed that the results of the proposed methods are statistically significant.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference48 articles.
1. Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces;Fukumizu;J. Mach. Learn. Res.,2004
2. A feature extraction technique based on principal component analysis for pulsed Eddy current NDT
3. Nonlinear Component Analysis as a Kernel Eigenvalue Problem
4. A survey on feature selection methods
5. An introduction to variable and feature selection;Guyon;J. Mach. Learn. Res.,2003
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献