A new algorithm to train hidden Markov models for biological sequences with partial labels-Reference-Cited by-同舟云学术

A new algorithm to train hidden Markov models for biological sequences with partial labels

Published:2021-03-26 Issue:1 Volume:22 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Li Jiefu,Lee Jung-Youn,Liao Li^ORCID

Abstract

Abstract Background Hidden Markov models (HMM) are a powerful tool for analyzing biological sequences in a wide variety of applications, from profiling functional protein families to identifying functional domains. The standard method used for HMM training is either by maximum likelihood using counting when sequences are labelled or by expectation maximization, such as the Baum–Welch algorithm, when sequences are unlabelled. However, increasingly there are situations where sequences are just partially labelled. In this paper, we designed a new training method based on the Baum–Welch algorithm to train HMMs for situations in which only partial labeling is available for certain biological problems. Results Compared with a similar method previously reported that is designed for the purpose of active learning in text mining, our method achieves significant improvements in model training, as demonstrated by higher accuracy when the trained models are tested for decoding with both synthetic data and real data. Conclusions A novel training method is developed to improve the training of hidden Markov models by utilizing partial labelled data. The method will impact on detecting de novo motifs and signals in biological sequence data. In particular, the method will be deployed in active learning mode to the ongoing research in detecting plasmodesmata targeting signals and assess the performance with validations from wet-lab experiments.

Funder

National Science Foundation

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

http://link.springer.com/content/pdf/10.1186/s12859-021-04080-0.pdf

Reference19 articles.

1. Baum LE, Petrie T. Statistical inference for probabilistic functions of finite state Markov chains. Ann Math Stat. 1966;37(6):1554–63.

2. Baum LE, Eagon JA. An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology. Bull Am Math Soc. 1967;73(3):360–3.

3. Baum LE, Sell G. Growth transformations for functions on manifolds. Pac J Math. 1968;27(2):211–27.

4. Baum LE, Petrie T, Soules G, Weiss N. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat. 1970;41(1):164–71.

5. Baum L. An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process. Inequalities. 1972;3:1–8.

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Killer whale respiration rates;PLOS ONE;2024-05-15

2. Traffic trajectory data analysis technology based on HMM model map matching algorithm;PLOS ONE;2024-05-08

3. A Survey on Syntactic Pattern Recognition Methods in Bioinformatics;Computer Science;2024-03-10

4. May the privacy be with us: Correlated differential privacy in location data for ITS;Computer Networks;2024-03

5. An expectation maximization algorithm for the hidden markov models with multiparameter student-t observations;Computational Statistics;2023-12-06