Affiliation:
1. Department of Statistics, TU Dortmund University, Dortmund, Germany
2. Federal Statistical Office of Germany (DESTATIS), Wiesbaden, Germany
Abstract
In statistical survey analysis, (partial) non-responders are integral elements during data acquisition. Treating missing values during data preparation and data analysis is therefore a non-trivial underpinning. Focusing on the German Structure of Earnings data from the Federal Statistical Office of Germany (DESTATIS), we investigate various imputation methods regarding their imputation accuracy and its impact on parameter estimates in the analysis phase after imputation. Since imputation accuracy measures are not uniquely determined in theory and practice, we study different measures for assessing imputation accuracy: Beyond the most common measures, the normalized-root mean squared error (NRMSE) and the proportion of false classification (PFC), we put a special focus on (distribution) distance measures for assessing imputation accuracy. The aim is to deliver guidelines for correctly assessing distributional accuracy after imputation and the potential effect on parameter estimates such as the mean gross income. Our empirical findings indicate a discrepancy between the NRMSE resp. PFC and distance measures. While the latter measure distributional similarities, NRMSE and PFC focus on data reproducibility. We realize that a low NRMSE or PFC is in general not accompanied by lower distributional discrepancies. However, distributional based measures correspond with more accurate parameter estimates such as mean gross income under the (multiple) imputation scheme.
Subject
Statistics, Probability and Uncertainty,Economics and Econometrics,Management Information Systems
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献