Predictive Mean Matching Imputation Procedure Based on Machine Learning Models for Complex Survey Data-Reference-Cited by-同舟云学术

Predictive Mean Matching Imputation Procedure Based on Machine Learning Models for Complex Survey Data

Published:2024 Issue: Volume: Page:456-468
ISSN:1680-743X
Container-title:Journal of Data Science
language:en
Short-container-title:

Author:

Chen Sixia,Xu Chao

Abstract

Missing data is a common occurrence in various fields, spanning social science, education, economics, and biomedical research. Disregarding missing data in statistical analyses can introduce bias to study outcomes. To mitigate this issue, imputation methods have proven effective in reducing nonresponse bias and generating complete datasets for subsequent analysis of secondary data. The efficacy of imputation methods hinges on the assumptions of the underlying imputation model. While machine learning techniques such as regression trees, random forest, XGBoost, and deep learning have demonstrated robustness against model misspecification, their optimal performance may necessitate fine-tuning under specific conditions. Moreover, imputed values generated by these methods can sometimes deviate unnaturally, falling outside the normal range. To address these challenges, we propose a novel Predictive Mean Matching imputation (PMM) procedure that leverages popular machine learning-based methods. PMM strikes a balance between robustness and the generation of appropriate imputed values. In this paper, we present our innovative PMM approach and conduct a comparative performance analysis through Monte Carlo simulation studies, assessing its effectiveness against other established methods.

Publisher

School of Statistics, Renmin University of China

Reference50 articles.

1. A review of hot deck imputation for survey non-response;International Statistical Review,2010

2. A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm;Information Sciences,2013

3. Random forests;Machine Learning,2001

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Introduction to the GASP Special Issue;Journal of Data Science;2024