Abstract
Previous methods of distributed Gibbs sampling for LDA run into either memory or communication bottlenecks. To improve scalability, we propose four strategies:
data placement
,
pipeline processing
,
word bundling
, and
priority-based scheduling
. Experiments show that our strategies significantly reduce the unparallelizable communication bottleneck and achieve good load balancing, and hence improve scalability of LDA.
Publisher
Association for Computing Machinery (ACM)
Subject
Artificial Intelligence,Theoretical Computer Science
Cited by
82 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献