The Feature Selection Effect on Missing Value Imputation of Medical Datasets-Reference-Cited by-同舟云学术

The Feature Selection Effect on Missing Value Imputation of Medical Datasets

Published:2020-03-29 Issue:7 Volume:10 Page:2344
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Liu Chia-Hui,Tsai Chih-Fong,Sue Kuen-Liang,Huang Min-Wei

Abstract

In practice, many medical domain datasets are incomplete, containing a proportion of incomplete data with missing attribute values. Missing value imputation can be performed to solve the problem of incomplete datasets. To impute missing values, some of the observed data (i.e., complete data) are generally used as the reference or training set, and then the relevant statistical and machine learning techniques are employed to produce estimations to replace the missing values. Since the collected dataset usually contains a certain number of feature dimensions, it is useful to perform feature selection for better pattern recognition. Therefore, the aim of this paper is to examine the effect of performing feature selection on missing value imputation of medical datasets. Experiments are carried out on five different medical domain datasets containing various feature dimensions. In addition, three different types of feature selection methods and imputation techniques are employed for comparison. The results show that combining feature selection and imputation is a better choice for many medical datasets. However, the feature selection algorithm should be carefully chosen in order to produce the best result. Particularly, the genetic algorithm and information gain models are suitable for lower dimensional datasets, whereas the decision tree model is a better choice for higher dimensional datasets.

Funder

Ministry of Science and Technology, Taiwan

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/10/7/2344/pdf

Reference47 articles.

1. Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: A clinical example

2. Missing value estimation methods for DNA microarrays

3. Missing value imputation strategies for metabolomics data

4. Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE: A CALIBER Study

5. Missing value imputation in high-dimensional phenomic data: imputable or not, and how?

Cited by 28 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A novel feature selection framework for incomplete data;Chemometrics and Intelligent Laboratory Systems;2024-09

2. Leveraging Quadratic Polynomials in Python for Advanced Data Analysis;F1000Research;2024-08-20

3. Spatiotemporal models of dengue epidemiology in the Philippines: Integrating remote sensing and interpretable machine learning;Acta Tropica;2024-07

4. Feature Selection Techniques for CR Isotope Identification with the AMS-02 Experiment in Space;Particles;2024-04-20

5. Tackling data challenges in forecasting effluent characteristics of wastewater treatment plants;Journal of Environmental Management;2024-03