Parallel EM algorithm on Hadoop for large-scale hidden Markov model parameter estimation


Li Xi,Wang Lizhi,Zhang Liyong


This paper aims to propose a parallel EM algorithm based on Hadoop for parameter estimation of large-scale hidden Markov models (HMM). HMM is a commonly used statistical model. However, since the parameter estimation of HMM involves the storage and processing of large-scale data sets, traditional serial algorithms have certain limitations in efficiency. This paper introduces the Hadoop parallel computing framework, divides the task into multiple subtasks through the MapReduce programming model and assigns them to different machines for parallel computing, which improves the efficiency and scalability of parameter estimation. The results show that using parallel EM algorithm for large-scale hidden Markov model parameter estimation on Hadoop is feasible and effective.


Darcy & Roy Press Co. Ltd.

Reference12 articles.

1. Gillick D, Faria A, DeNero J. Mapreduce: Distributed computing for machine learning[J]. Berkley, Dec, 2006, 18.

2. Cao Xu. Research and Improvement of a Massive Log Data Processing Model on the Hadoop Platform [D] Zhejiang University of Technology, 2013.

3. Lu Jiang, Li Yun Research on Feature Selection Parallelization Based on MapReduce [J] Computer Science, 2015, 42 (08): 44-47.

4. Li Zhao, Li Xiao, Wang Chunmei, et al Research on a MapReduce based text clustering method [J] Computer Science, 2016, 43 (01): 246-250.

5. He Qing, Li Ning, Luo Wenjuan, et al Overview of Machine Learning Algorithms under Big Data [C]//Conference on Artificial Intelligence of the Chinese Computer Society two thousand and thirteen







Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3