Evaluating the impact of multivariate imputation by MICE in feature selection

Author:

Mera-Gaona MaritzaORCID,Neumann Ursula,Vargas-Canas Rubiel,López Diego M.

Abstract

Handling missing values is a crucial step in preprocessing data in Machine Learning. Most available algorithms for analyzing datasets in the feature selection process and classification or estimation process analyze complete datasets. Consequently, in many cases, the strategy for dealing with missing values is to use only instances with full data or to replace missing values with a mean, mode, median, or a constant value. Usually, discarding missing samples or replacing missing values by means of fundamental techniques causes bias in subsequent analyzes on datasets. Aim: Demonstrate the positive impact of multivariate imputation in the feature selection process on datasets with missing values. Results: We compared the effects of the feature selection process using complete datasets, incomplete datasets with missingness rates between 5 and 50%, and imputed datasets by basic techniques and multivariate imputation. The feature selection algorithms used are well-known methods. The results showed that the datasets imputed by multivariate imputation obtained the best results in feature selection compared to datasets imputed by basic techniques or non-imputed incomplete datasets. Conclusions: Considering the results obtained in the evaluation, applying multivariate imputation by MICE reduces bias in the feature selection process.

Funder

colciencias, colombian agency of science, technology, and innovation

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference48 articles.

1. Missing data: Our view of the state of the art;J. L. Schafer;Psychol. Methods,2002

2. Biometrika Trust Inference and Missing Data Author (s): Donald B. Rubin Published by: Oxford University Press on behalf of Biometrika Trust Stable URL: https://www.jstor.org/stable/2335739;D. B. Rubin;BiometrikaTrust,1976

3. Principled Approaches to Missing Data in Epidemiologic Studies;N. J. Perkins;Am. J. Epidemiol,2018

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3