Utility metric for unsupervised feature selection

Author:

Villa Amalia12,Mundanad Narayanan Abhijith12,Van Huffel Sabine12ORCID,Bertrand Alexander12,Varon Carolina134ORCID

Affiliation:

1. STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, Department of Electrical Engineering (ESAT), KU Leuven, Leuven, Belgium

2. Leuven.AI, KU Leuven Institute for AI, Leuven, Belgium

3. Circuits and Systems (CAS) Group, Delft University of Technology, Delft, The Netherlands

4. e-Media Research Lab, Campus GroepT, KU Leuven, Leuven, Belgium

Abstract

Feature selection techniques are very useful approaches for dimensionality reduction in data analysis. They provide interpretable results by reducing the dimensions of the data to a subset of the original set of features. When the data lack annotations, unsupervised feature selectors are required for their analysis. Several algorithms for this aim exist in the literature, but despite their large applicability, they can be very inaccessible or cumbersome to use, mainly due to the need for tuning non-intuitive parameters and the high computational demands. In this work, a publicly available ready-to-use unsupervised feature selector is proposed, with comparable results to the state-of-the-art at a much lower computational cost. The suggested approach belongs to the methods known as spectral feature selectors. These methods generally consist of two stages: manifold learning and subset selection. In the first stage, the underlying structures in the high-dimensional data are extracted, while in the second stage a subset of the features is selected to replicate these structures. This paper suggests two contributions to this field, related to each of the stages involved. In the manifold learning stage, the effect of non-linearities in the data is explored, making use of a radial basis function (RBF) kernel, for which an alternative solution for the estimation of the kernel parameter is presented for cases with high-dimensional data. Additionally, the use of a backwards greedy approach based on the least-squares utility metric for the subset selection stage is proposed. The combination of these new ingredients results in the utility metric for unsupervised feature selection U2FS algorithm. The proposed U2FS algorithm succeeds in selecting the correct features in a simulation environment. In addition, the performance of the method on benchmark datasets is comparable to the state-of-the-art, while requiring less computational time. Moreover, unlike the state-of-the-art, U2FS does not require any tuning of parameters.

Funder

FWO project

European Research Council

Flemish Government

Bijzonder Onderzoeksfonds KU Leuven

Agentschap Innoveren en Ondernemen

European Commission

Publisher

PeerJ

Subject

General Computer Science

Reference40 articles.

1. On the surprising behavior of distance metrics in high dimensional space;Aggarwal,2001

2. Extended sammon projection and wavelet kernel extreme learning machine for gait-based legitimate user identification;Ahmad,2019

3. Multiway spectral clustering with out-of-sample extensions through weighted kernel pca;Alzate;IEEE transactions on pattern analysis and machine intelligence,2008

4. Laplacian eigenmaps and spectral techniques for embedding and clustering;Belkin,2002

5. Utility metrics for assessment and subset selection of input variables for linear estimation [tips & tricks];Bertrand;IEEE Signal Processing Magazine,2018

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3