On Privacy Protection of Latent Dirichlet Allocation Model Training-Reference-Cited by-同舟云学术

On Privacy Protection of Latent Dirichlet Allocation Model Training

Published:2019-08 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
language:
Short-container-title:

Author:

Zhao Fangyuan¹²³,Ren Xuebin¹²,Yang Shusen²³,Yang Xinyu¹²

Affiliation:

1. School of Computer Science and Technology, Xi’an Jiaotong University, China

2. National Engineering Laboratory for Big Data Analytics, Xi’an Jiaotong University, China

3. Ministry of Education Key Lab For Intelligent Networks and Network Security, Xi’an Jiaotong University, China

Abstract

Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for discovery of hidden semantic architecture of text datasets, and plays a fundamental role in many machine learning applications. However, like many other machine learning algorithms, the process of training a LDA model may leak the sensitive information of the training datasets and bring significant privacy risks. To mitigate the privacy issues in LDA, we focus on studying privacy-preserving algorithms of LDA model training in this paper. In particular, we first develop a privacy monitoring algorithm to investigate the privacy guarantee obtained from the inherent randomness of the Collapsed Gibbs Sampling (CGS) process in a typical LDA training algorithm on centralized curated datasets. Then, we further propose a locally private LDA training algorithm on crowdsourced data to provide local differential privacy for individual data contributors. The experimental results on real-world datasets demonstrate the effectiveness of our proposed algorithms.

Publisher

International Joint Conferences on Artificial Intelligence Organization

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Exploring Varied Experiences of Foreign and Domestic Tourists at Borobudur through LDA Topic AnalysisIvan;2024 3rd International Conference on Digital Transformation and Applications (ICDXA);2024-01-29

2. FDP-LDA: Inherent Privacy Amplification of Collapsed Gibbs Sampling via Group Subsampling;Web and Big Data;2023

3. Improving Parameter Estimation and Defensive Ability of Latent Dirichlet Allocation Model Training Under Rényi Differential Privacy;Journal of Computer Science and Technology;2022-11-30

4. Generative Adversarial Attack on Ensemble Clustering;2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV);2022-01

5. Privacy Preserving Text Data Encoding and Topic Modelling;2021 IEEE International Conference on Big Data (Big Data);2021-12-15