A Classification Method for Incomplete Mixed Data Using Imputation and Feature Selection-Reference-Cited by-同舟云学术

A Classification Method for Incomplete Mixed Data Using Imputation and Feature Selection

Published:2024-07-09 Issue:14 Volume:14 Page:5993
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Li Gengsong¹^ORCID,Zheng Qibin²,Liu Yi²,Li Xiang²,Qin Wei²,Diao Xingchun¹

Affiliation:

1. National Innovation Institute of Defense Technology, Beijing 100071, China

2. Academy of Military Sciences, Beijing 100091, China

Abstract

Data missing is a ubiquitous problem in real-world systems that adversely affects the performance of machine learning algorithms. Although many useful imputation methods are available to address this issue, they often fail to consider the information provided by both features and labels. As a result, the performance of these methods might be constrained. Furthermore, feature selection as a data quality improvement technique has been widely used and has demonstrated its efficiency. To overcome the limitation of imputation methods, we propose a novel algorithm that combines data imputation and feature selection to tackle classification problems for mixed data. Based on the mean and standard deviation of quantitative features and the selecting probabilities of unique values of categorical features, our algorithm constructs different imputation models for quantitative and categorical features. Particle swarm optimization is used to optimize the parameters of the imputation models and select feature subsets simultaneously. Additionally, we introduce a legacy learning mechanism to enhance the optimization capability of our method. To evaluate the performance of the proposed method, seven algorithms and twelve datasets are used for comparison. The results show that our algorithm outperforms other algorithms in terms of accuracy and F1 score and has reasonable time overhead.

Funder

National Science Foundation for Young Scientists of China

Young Elite Scientists Sponsorship Program by CAST

Publisher

MDPI AG

Link

https://www.mdpi.com/2076-3417/14/14/5993/pdf

Reference34 articles.

1. Missing data in surveys: Key concepts, approaches, and applications;Mirzaei;Res. Soc. Adm. Pharm.,2022

2. Improving the quality of web-based data imputation with crowd intervention;Gu;IEEE Trans. Knowl. Data Eng.,2021

3. Luo, Y. (2022). Evaluating the state of the art in missing data imputation for clinical data. Brief. Bioinform., 23.

4. Prediction of concrete strengths enabled by missing data imputation and interpretable machine learning;Lyngdoh;Cem. Concr. Compos.,2022

5. Systematic review of using machine learning in imputing missing values;Alabadla;IEEE Access,2022