Affiliation:
1. Faculty of Sciences , USMBA , Fez , Morocco
2. Faculty of Sciences , UAE , Tetouan , Morocco
Abstract
Abstract
Feature selection is an essential pre-processing step in data mining. It aims at identifying the highly predictive feature subset out of a large set of candidate features. Several approaches for feature selection have been proposed in the literature. Random Forests (RF) are among the most used machine learning algorithms not just for their excellent prediction accuracy but also for their ability to select informative variables with their associated variable importance measures. Sometimes RF model over-fits on noisy features, which lead to choosing the noisy features as the informative variables and eliminating the significant ones. Whereas, eliminating and preventing those noisy features first, the low ranked features may become more important. In this study we propose a new variant of RF that provides unbiased variable selection where a noisy feature trick is used to address this problem. First, we add a noisy feature to a dataset. Second, the noisy feature is used as a stopping criterion. If the noisy feature is selected as the best splitting feature, then we stop the creation process because at this level, the model starts to over-fit on the noisy features. Finally, the best subset of features is selected out of the best-ranked feature regarding the Geni impurity of this new variant of RF. To test the validity and the effectiveness of the proposed method, we compare it with RF variable importance measure using eleven benchmarking datasets.
Reference26 articles.
1. 1. Akhiat, Y., M. Chahhou, A. Zinedine. Ensemble Feature Selection Algorithm. – International Journal of Intelligent Systems and Applications, Vol. 11, 2019, No 1, p. 24.
2. 2. Akhiat, Y., M. Chahhou, A. Zinedine. Feature Selection Based on Pairwise Evalution. – In: Proc. of 2017 Intelligent Systems and Computer Vision (ISCV’17), IEEE, 2017.
3. 3. Akhiat, Y., M. Chahhou, A. Zinedine. Feature Selection Based on Graph Representation. – In: Proc. of 5th International Congress on Information Science and Technology (CiSt’18), IEEE, 2018.
4. 4. Venkatesh, B., J. Anuradha. A Review of Feature Selection and Its Methods. – Cybernetics and Information Technologies, Vol. 19, 2019, No 1, pp. 3-26.
5. 5. Li, J., et al. Feature Selection: A Data Perspective. – ACM Computing Surveys (CSUR), Vol. 50, 2017, No 6, pp. 1-45.
Cited by
20 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献