Framework for the Ensemble of Feature Selection Methods

Author:

Mera-Gaona MaritzaORCID,López Diego M.ORCID,Vargas-Canas Rubiel,Neumann Ursula

Abstract

Feature selection (FS) has attracted the attention of many researchers in the last few years due to the increasing sizes of datasets, which contain hundreds or thousands of columns (features). Typically, not all columns represent relevant values. Consequently, the noise or irrelevant columns could confuse the algorithms, leading to a weak performance of machine learning models. Different FS algorithms have been proposed to analyze highly dimensional datasets and determine their subsets of relevant features to overcome this problem. However, very often, FS algorithms are biased by the data. Thus, methods for ensemble feature selection (EFS) algorithms have become an alternative to integrate the advantages of single FS algorithms and compensate for their disadvantages. The objective of this research is to propose a conceptual and implementation framework to understand the main concepts and relationships in the process of aggregating FS algorithms and to demonstrate how to address FS on datasets with high dimensionality. The proposed conceptual framework is validated by deriving an implementation framework, which incorporates a set of Phyton packages with functionalities to support the assembly of feature selection algorithms. The performance of the implementation framework was demonstrated in several experiments discovering relevant features in the Sonar, SPECTF, and WDBC datasets. The experiments contrasted the accuracy of two machine learning classifiers (decision tree and logistic regression), trained with subsets of features generated either by single FS algorithms or the set of features selected by the ensemble feature selection framework. We observed that for the three datasets used (Sonar, SPECTF, and WD), the highest precision percentages (86.95%, 74.73%, and 93.85%, respectively) were obtained when the classifiers were trained with the subset of features generated by our framework. Additionally, the stability of the feature sets generated using our ensemble method was evaluated. The results showed that the method achieved perfect stability for the three datasets used in the evaluation.

Funder

Departamento Administrativo de Ciencia, Tecnología e Innovación

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Reference44 articles.

1. An Introduction to Variable and Feature Selection;Guyon;J. Mach. Learn. Res.,2003

2. Pattern Recognition;Theodoridis,2003

3. Selection of relevant features and examples in machine learning

4. Wrappers for feature subset selection

Cited by 27 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Supervised Rank Aggregation (SRA): A Novel Rank Aggregation Approach for Ensemble-based Feature Selection;Recent Advances in Computer Science and Communications;2024-05

2. Exploring Impact of Feature Selection on Classification Models for Detection of Alzheimer’s Disease;2024 International Conference on Knowledge Engineering and Communication Systems (ICKECS);2024-04-18

3. Data-Driven Methodology to Assess Raw Materials Impact on Manufacturing Systems Breakdowns;International Journal of Prognostics and Health Management;2024-04-17

4. A Comprehensive Study on Ensemble Feature Selection Techniques for Classification;2024 11th International Conference on Computing for Sustainable Global Development (INDIACom);2024-02-28

5. A hybrid spherical fuzzy AHP-MARCOS model for evaluating the condition of saltwater pipes in Hong Kong;Engineering, Construction and Architectural Management;2024-02-21

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3