A Semiparametric Multiple Imputation Approach to Fully Synthetic Data for Complex Surveys-Reference-Cited by-同舟云学术

A Semiparametric Multiple Imputation Approach to Fully Synthetic Data for Complex Surveys

Published:2022-05-25 Issue:3 Volume:10 Page:618-641
ISSN:2325-0984
Container-title:Journal of Survey Statistics and Methodology
language:en
Short-container-title:

Author:

Yu Mandi^ORCID,He Yulei,Raghunathan Trivellore E

Abstract

Abstract Data synthesis is an effective statistical approach for reducing data disclosure risk. Generating fully synthetic data might minimize such risk, but its modeling and application can be difficult for data from large, complex surveys. This article extended the two-stage imputation to simultaneously impute item missing values and generate fully synthetic data. A new combining rule for making inferences using data generated in this manner was developed. Two semiparametric missing data imputation models were adapted to generate fully synthetic data for skewed continuous variable and sparse binary variable, respectively. The proposed approach was evaluated using simulated data and real longitudinal data from the Health and Retirement Study. The proposed approach was also compared with two existing synthesis approaches: (1) parametric regressions models as implemented in IVEware; and (2) nonparametric Classification and Regression Trees as implemented in synthpop package for R using real data. The results show that high data utility is maintained for a wide variety of descriptive and model-based statistics using the proposed strategy. The proposed strategy also performs better than existing methods for sophisticated analyses such as factor analysis.

Funder

National Institute of Child Health and Human Development

Publisher

Oxford University Press (OUP)

Subject

Applied Mathematics,Statistics, Probability and Uncertainty,Social Sciences (miscellaneous),Statistics and Probability

Link

https://academic.oup.com/jssam/article-pdf/10/3/618/44275484/smac016.pdf

Reference56 articles.

1. On the Existence of Maximum Likelihood Estimates in Logistic Regression Models;Albert;Biometrika,1984

2. Estimating Optimal Transformations for Multiple Regression and Correlation;Breiman;Journal of the American Statistical Association,1985

3. Graphical and Numerical Diagnostic Tools to Assess Suitability of Multiple Imputations and Imputation Models;Bondarenko;Statistics in Medicine,2016

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Advancing microdata privacy protection: A review of synthetic data methods;WIREs Computational Statistics;2023-11-13