HT-Fed-GAN: Federated Generative Model for Decentralized Tabular Data Synthesis

Author:

Duan Shaoming,Liu Chuanyi,Han Peiyi,Jin Xiaopeng,Zhang XinyiORCID,He Tianyu,Pan Hezhong,Xiang Xiayu

Abstract

In this paper, we study the problem of privacy-preserving data synthesis (PPDS) for tabular data in a distributed multi-party environment. In a decentralized setting, for PPDS, federated generative models with differential privacy are used by the existing methods. Unfortunately, the existing models apply only to images or text data and not to tabular data. Unlike images, tabular data usually consist of mixed data types (discrete and continuous attributes) and real-world datasets with highly imbalanced data distributions. Existing methods hardly model such scenarios due to the multimodal distributions in the decentralized continuous columns and highly imbalanced categorical attributes of the clients. To solve these problems, we propose a federated generative model for decentralized tabular data synthesis (HT-Fed-GAN). There are three important parts of HT-Fed-GAN: the federated variational Bayesian Gaussian mixture model (Fed-VB-GMM), which is designed to solve the problem of multimodal distributions; federated conditional one-hot encoding with conditional sampling for global categorical attribute representation and rebalancing; and a privacy consumption-based federated conditional GAN for privacy-preserving decentralized data modeling. The experimental results on five real-world datasets show that HT-Fed-GAN obtains the best trade-off between the data utility and privacy level. For the data utility, the tables generated by HT-Fed-GAN are the most statistically similar to the original tables and the evaluation scores show that HT-Fed-GAN outperforms the state-of-the-art model in terms of machine learning tasks.

Funder

National Natural Science Foundation of China

Publisher

MDPI AG

Subject

General Physics and Astronomy

Reference39 articles.

1. Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid;Kohavi;KDD 1996 Proceedings,1996

2. McFee, B., Bertin-Mahieux, T., Ellis, D.P., and Lanckriet, G.R. (2012, January 16–20). The million song dataset challenge. Proceedings of the 21st International Conference on World Wide Web, Lyon, France.

3. Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., and Bai, X. (2017, January 9–15). ICDAR2017 competition on reading chinese text in the wild (RCTW-17). Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan.

4. Park, N., Mohammadi, M., Gorde, K., Jajodia, S., Park, H., and Kim, Y. (2018, January 27–31). Data synthesis based on generative adversarial networks. Proceedings of the VLDB Endowment 2018, Rio de Janeiro, Brazil.

5. Jordon, J., Yoon, J., and Van Der Schaar, M. (2019, January 6–9). PATE-GAN: Generating synthetic data with differential privacy guarantees. Proceedings of the International Conference on Learning Representations, New Orleans, OR, USA.

Cited by 5 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. SCGAN: Semi-Centralized Generative Adversarial Network for image generation in distributed scenes;Information Fusion;2024-12

2. FLIGAN;Proceedings of the 7th International Workshop on Edge Systems, Analytics and Networking;2024-04-22

3. Federated learning for generating synthetic data: a scoping review;International Journal of Population Data Science;2023-10-31

4. Attribute-Centric and Synthetic Data Based Privacy Preserving Methods: A Systematic Review;Journal of Cybersecurity and Privacy;2023-09-11

5. Securing Federated GANs: Enabling Synthetic Data Generation for Health Registry Consortiums;Proceedings of the 18th International Conference on Availability, Reliability and Security;2023-08-29

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3