Author:
Liu Chia-Hui,Tsai Chih-Fong,Sue Kuen-Liang,Huang Min-Wei
Abstract
In practice, many medical domain datasets are incomplete, containing a proportion of incomplete data with missing attribute values. Missing value imputation can be performed to solve the problem of incomplete datasets. To impute missing values, some of the observed data (i.e., complete data) are generally used as the reference or training set, and then the relevant statistical and machine learning techniques are employed to produce estimations to replace the missing values. Since the collected dataset usually contains a certain number of feature dimensions, it is useful to perform feature selection for better pattern recognition. Therefore, the aim of this paper is to examine the effect of performing feature selection on missing value imputation of medical datasets. Experiments are carried out on five different medical domain datasets containing various feature dimensions. In addition, three different types of feature selection methods and imputation techniques are employed for comparison. The results show that combining feature selection and imputation is a better choice for many medical datasets. However, the feature selection algorithm should be carefully chosen in order to produce the best result. Particularly, the genetic algorithm and information gain models are suitable for lower dimensional datasets, whereas the decision tree model is a better choice for higher dimensional datasets.
Funder
Ministry of Science and Technology, Taiwan
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Cited by
28 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献