Affiliation:
1. AI Group, WeBank Co., Ltd., China
2. State Key Laboratory of Software Development Environment, Beihang University, China
3. Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong
Abstract
Probabilistic topic modeling has been applied in a variety of industrial applications. Training a high-quality model usually requires a massive amount of data to provide comprehensive co-occurrence information for the model to learn. However, industrial data such as medical or financial records are often proprietary or sensitive, which precludes uploading to data centers. Hence, training topic models in industrial scenarios using conventional approaches faces a dilemma: A party (i.e., a company or institute) has to either tolerate data scarcity or sacrifice data privacy. In this article, we propose a framework named
Industrial Federated Topic Modeling
(iFTM), in which multiple parties collaboratively train a high-quality topic model by simultaneously alleviating data scarcity and maintaining immunity to privacy adversaries. iFTM is inspired by federated learning, supports two representative topic models (i.e., Latent Dirichlet Allocation and SentenceLDA) in industrial applications, and consists of novel techniques such as private Metropolis-Hastings, topic-wise normalization, and heterogeneous model integration. We conduct quantitative evaluations to verify the effectiveness of iFTM and deploy iFTM in two real-life applications to demonstrate its utility. Experimental results verify iFTM’s superiority over conventional topic modeling.
Funder
State Key Laboratory of Software Development Environment (Beihang University) Open Program
National Science Foundation of China
National Key Research and Development Program of China
Publisher
Association for Computing Machinery (ACM)
Subject
Artificial Intelligence,Theoretical Computer Science
Reference49 articles.
1. A topic model of clinical reports
2. On a Topic Model for Sentences
3. Latent Dirichlet allocation;Blei David M.;J. Mach. Learn. Res. 3,2003
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献