Abstract
In the age of data, data mining provides feasible tools with which to handle large datasets consisting of data from multiple sources. However, there is limited research on retrieving statistical information from data when data are confidential and cannot be shared directly. In this paper, we address this problem and propose a framework for performing data analysis using data from multiple sources without revealing true values for privacy purposes. The proposed framework includes three steps. First, data custodians individually mask data before publishing; then, the masked data collection is used to reconstruct the density function of the original dataset, from which resampled values are generated; last, existing data mining techniques are applied directly to the resampled data. This framework utilises the technique of reconstructing an original density function from noise-masked data using the moment-based density estimation method, which plays an essential role. Simulation studies show that the proposed framework performs well; analysis results from the resampled data are comparable to those of the original data when the density of the original data is estimated well. The proposed framework is demonstrated in data clustering analysis using the example of a real-life Australian soybean dataset. Results from the k-means algorithms with two and three fitted clusters are presented to show that cluster analysis using resampled data can well replicate that of the original data.
Subject
General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)
Reference24 articles.
1. Data mining: Going beyond traditional statistics;Zhao;New Dir. Institutional Res.,2006
2. Access control technologies for Big Data management systems: Literature review and future trends;Colombo;Cybersecurity,2019
3. Access Control for Databases: Concepts and Systems;Bertino;Found. Trends® Databases,2011
4. Torra, V. (2017). Data Privacy: Foundations, New Developments and the Big Data Challenge, Springer International.
5. Torra, V., and Navarro-Arribas, G. (2016). Proceedings of the Privacy and Identity Management. Facing up to Next Steps. Privacy and Identity 2016, Springer. IFIP Advances in Information and Communication Technology.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献