Multiple Imputation Ensembles (MIE) for Dealing with Missing Data-Reference-Cited by-同舟云学术

Multiple Imputation Ensembles (MIE) for Dealing with Missing Data

Published:2020-04-23 Issue:3 Volume:1 Page:
ISSN:2662-995X
Container-title:SN Computer Science
language:en
Short-container-title:SN COMPUT. SCI.

Author:

Aleryani Aliya,Wang Wenjia,de la Iglesia Beatriz

Abstract

AbstractMissing data is a significant issue in many real-world datasets, yet there are no robust methods for dealing with it appropriately. In this paper, we propose a robust approach to dealing with missing data in classification problems: Multiple Imputation Ensembles (MIE). Our method integrates two approaches: multiple imputation and ensemble methods and compares two types of ensembles: bagging and stacking. We also propose a robust experimental set-up using 20 benchmark datasets from the UCI machine learning repository. For each dataset, we introduce increasing amounts of data Missing Completely at Random. Firstly, we use a number of single/multiple imputation methods to recover the missing values and then ensemble a number of different classifiers built on the imputed data. We assess the quality of the imputation by using dissimilarity measures. We also evaluate the MIE performance by comparing classification accuracy on the complete and imputed data. Furthermore, we use the accuracy of simple imputation as a benchmark for comparison. We find that our proposed approach combining multiple imputation with ensemble techniques outperform others, particularly as missing data increases.

Funder

Business and Local Government Data Research Centre

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s42979-020-00131-0.pdf

Reference76 articles.

1. Abayomi K, Gelman A, Levy M. Diagnostics for multivariate imputations. J R Stat Soc Ser C (Appl Stat). 2008;57(3):273–91.

2. Aleryani A, Wang W, De La Iglesia B. Dealing with missing data and uncertainty in the context of data mining. In: International conference on hybrid artificial intelligence systems, Springer, p. 289–301; 2018.

3. Azur MJ, Stuart EA, Frangakis C, Leaf PJ. Multiple imputation by chained equations: what is it and how does it work? Int J Methods Psychiatr Res. 2011;20(1):40–9.

4. Batista GE, Monard MC. An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell. 2003;17(5–6):519–33.

5. Boser BE, Guyon IM, Vapnik VN. A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on Computational learning theory, ACM, p. 144–152; 1992.

Cited by 23 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Association between blood lipid levels in early pregnancy and urinary organophosphate metabolites in the Japan Environment and Children’s Study;Environment International;2024-08

2. A hybridization of multiple imputation and one-class bagging ensemble approach for missing value and class imbalance problem;Evolving Systems;2024-07-13

3. Fusion Learning of Regression Models for Missing Data Imputation in Breast Cancer Dataset;2023 International Conference on Artificial Intelligence for Innovations in Healthcare Industries (ICAIIHI);2023-12-29

4. Optimization of missing value imputation for neural networks;Information Sciences;2023-11

5. Periconceptional maternal diet quality and offspring wheeze trajectories: Japan Environment and Children's Study;Allergy;2023-10-18