Author:
Firouznia Marjan,Ruiu Pietro,Trunfio Giuseppe A.
Abstract
AbstractIn many fields, it is a common practice to collect large amounts of data characterized by a high number of features. These datasets are at the core of modern applications of supervised machine learning, where the goal is to create an automatic classifier for newly presented data. However, it is well known that the presence of irrelevant features in a dataset can make the learning phase harder and, most importantly, can lead to suboptimal classifiers. Consequently, it is becoming increasingly important to be able to select the right subset of features. Traditionally, optimization metaheuristics have been used with success in the task of feature selection. However, many of the approaches presented in the literature are not applicable to datasets with thousands of features because of the poor scalability of optimization algorithms. In this article, we address the problem using a cooperative coevolutionary approach based on differential evolution. In the proposed algorithm, parallelized for execution on shared-memory architectures, a suitable strategy for reducing the dimensionality of the search space and adjusting the population size during the optimization results in significant performance improvements. A numerical investigation on some high-dimensional and medium-dimensional datasets shows that, in most cases, the proposed approach can achieve higher classification performance than other state-of-the-art methods.
Funder
Università degli Studi di Sassari
Publisher
Springer Science and Business Media LLC
Subject
Hardware and Architecture,Information Systems,Theoretical Computer Science,Software
Reference59 articles.
1. Bache K, Lichman M (2016) UCI machine learning repository. http://archive.ics.uci.edu/ml/index.php
2. Bhanu B, Krawiec K (2002) Coevolutionary construction of features for transformation of representation in machine learning. In: Proceedings of Genetic and Evolutionary Computation Conference (Workshop on Coevolution), pp 249–254
3. Breiman L (2001) Random forests. Mach Learn 45(1):5–32
4. Cawley GC, Talbot NL, Girolami M (2007) Sparse multinomial logistic regression via Bayesian l1 regularisation. Adv Neural Inf Process Syst 19:209
5. Chen K, Xue B, Zhang M et al (2020) An evolutionary multitasking-based feature selection method for high-dimensional classification. IEEE Trans Cyber 52(7):7172–7186
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献