Affiliation:
1. Université de Neuchâtel PhD candidate, Institute of Statistics, , Av. de Bellevaux 51, 2000 Neuchâtel, Switzerland
2. Université de Neuchâtel Professor, Institute of Statistics, , Av. de Bellevaux 51, 2000 Neuchâtel, Switzerland
Abstract
Abstract
Imputation procedures are frequently used to treat nonresponse. With random hot deck imputation, missing values are replaced by valid observed values from other units in the same dataset. The recently developed balanced nearest neighbor imputation method, implemented in the SwissCheese R package, generates random hot deck imputation under certain balancing constraints to decrease the variance of the total estimator, in the presence of multivariate nonresponse. The method relies on a notion of neighborhood between units, utilizing a distance measure that becomes difficult to define in high dimensions. In contrast to hot deck imputation methods, many imputation procedures obtain replacement values from prediction models fit from observed data. The missForest method, which uses random forests as prediction models, is an example of this approach. In this article, we propose a new approach that uses the two methods in a complementary manner. We refine the distance measure in the SwissCheese method using missForest predictions. Through a simulation study on empirical data from the Swiss Survey on Income and Living Conditions, we demonstrate reductions in Monte Carlo variance, bias, and mean squared error of the totals obtained by our proposed imputed estimator compared to those obtained using SwissCheese alone.
Publisher
Oxford University Press (OUP)
Subject
Applied Mathematics,Statistics, Probability and Uncertainty,Social Sciences (miscellaneous),Statistics and Probability
Reference26 articles.
1. A Review of Dot Deck Imputation for Survey Non-Response;Andridge;International Statistical Review,2010
2. Analysis of a Random Forests Model;Biau;The Journal of Machine Learning Research,2012
3. Random Forests;Breiman;Machine Learning,2001
4. Statistical Modeling: The Two Cultures (with Comments and a Rejoinder by the Author);Breiman;Statistical Science,2001
5. Recent Developments in Dealing with Item Non-Response in Surveys: A Critical Review;Chen;International Statistical Review,2019