Abstract
AbstractProviding access to synthetic micro-data in place of confidential data to protect the privacy of participants is common practice. For the synthetic data to be useful for analysis, it is necessary that the density function of the synthetic data closely approximate the confidential data. Hence, accurately estimating the density function based on sample micro-data is important. Existing kernel-based, copula-based, and machine learning methods of joint density estimation may not be viable. Applying the multivariate moments’ problem to sample-based density estimation has long been considered impractical due to the computational complexity and intractability of optimal parameter selection of the density estimate when the true joint density function is unknown. This paper introduces a generalised form of the sample moment-based density estimate, which can be used to estimate joint density functions when only the information of empirical moments is available. We demonstrate optimal parametrisation of the moment-based density estimate based solely on sample data by employing a computational strategy for parameter selection. We compare the performance of the moment-based estimate to that of existing non-parametric and parametric density estimation methods. The results show that using empirical moments can provide a reasonable, robust non-parametric approximation of a joint density function that is comparable to existing non-parametric methods. We provide an example of synthetic data generation from the moment-based density estimate and show that the resulting synthetic data provides a reasonable disclosure-protected alternative for public release.
Publisher
Springer Science and Business Media LLC
Subject
Computational Theory and Mathematics,Statistics, Probability and Uncertainty,Statistics and Probability,Theoretical Computer Science
Reference42 articles.
1. Akhiezer, N.I., Kemmer, N.: The Classical Moment Problem: And Some Related Questions in Analysis, vol. 5. Oliver & Boyd, Edinburgh (1965)
2. Alexits, G.: Convergence Problems of Orthogonal Series. International Series of Monographs in Pure and Applied Mathematics, vol. 20, pp. 63–170. Pergamon Press, Oxford (1961). https://books.google.com.au/books?id=VAJRAAAAMAAJ
3. Charlier, C.V.L.: Frequency curves of type a in heterograde statistics. Ark. Mat. Ast. Fysik 9, 1–17 (1914)
4. Cramer, H.: Mathematical Methods of Statistics, vol. 9, pp. 85–89. Princeton University Press, Princeton (1946)
5. Duong, T.: ks: Kernel density estimation and kernel discriminant analysis for multivariate data in R. J. Stat. Softw. 21(7), 1–16 (2007). https://doi.org/10.18637/jss.v021.i07