PFA-Nipals: An Unsupervised Principal Feature Selection Based on Nonlinear Estimation by Iterative Partial Least Squares

Author:

Castillo-Ibarra Emilio1ORCID,Alsina Marco A.2ORCID,Astudillo Cesar A.3ORCID,Fuenzalida-Henríquez Ignacio4ORCID

Affiliation:

1. Engineering Systems Doctoral Program, Faculty of Engineering, Universidad de Talca, Campus Curicó, Curicó 3340000, Chile

2. Faculty of Engineering, Architecture and Design, Universidad San Sebastian, Bellavista 7, Santiago 8420524, Chile

3. Department of Computer Science, Faculty of Engineering, University of Talca, Campus Curicó, Curicó 3340000, Chile

4. Building Management and Engineering Department, Faculty of Engineering, University of Talca, Campus Curicó, Curicó 3340000, Chile

Abstract

Unsupervised feature selection (UFS) has received great interest in various areas of research that require dimensionality reduction, including machine learning, data mining, and statistical analysis. However, UFS algorithms are known to perform poorly on datasets with missing data, exhibiting a significant computational load and learning bias. In this work, we propose a novel and robust UFS method, designated PFA-Nipals, that works with missing data without the need for deletion or imputation. This is achieved by considering an iterative nonlinear estimation of principal components by partial least squares, while the relevant features are selected through minibatch K-means clustering. The proposed method is successfully applied to select the relevant features of a robust health dataset with missing data, outperforming other UFS methods in terms of computational load and learning bias. Furthermore, the proposed method is capable of finding a consistent set of relevant features without biasing the explained variability, even under increasing missing data. Finally, it is expected that the proposed method could be used in several areas, such as machine learning and big data with applications in different areas of the medical and engineering sciences.

Funder

BECA ESTUDIO DE DOCTORADO, UNIVERSIDAD DE TALCA

Faculty of Engineering, Campus Curicó, University of Talca

Publisher

MDPI AG

Subject

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Reference42 articles.

1. A review of unsupervised feature selection methods;Artif. Intell. Rev.,2020

2. Introduction to machine learning;Ozuysal;Methods Mol. Biol.,2014

3. Identifying Critical Variables of Principal Components for Unsupervised Feature Selection;Mao;IEEE Trans. Syst. Man Cybern. Part B (Cybern.),2005

4. Ding, C., and Peng, H. Minimum redundancy feature selection from microarray gene expression data. Proceedings of the 2003 IEEE Bioinformatics Conference.

5. Minimum redundancy feature selection from microarray gene expression data;Ding;J. Bioinform. Comput. Biol.,2005

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3