Perturbative formulation of general continuous-time Markov model of sequence evolution via insertions/deletions, Part I: Theoretical basis-Reference-Cited by-同舟云学术

Perturbative formulation of general continuous-time Markov model of sequence evolution via insertions/deletions, Part I: Theoretical basis

Published:2015-07-31 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Ezawa Kiyoshi^ORCID,Graur Dan,Landan Giddy

Abstract

AbstractBackgroundInsertions and deletions (indels) account for more nucleotide differences between two related DNA sequences than substitutions do, and thus it is imperative to develop a stochastic evolutionary model that enables us to reliably calculate the probability of the sequence evolution through indel processes. Recently, such probabilistic models are mostly based on either hidden Markov models (HMMs) or transducer theories, both of which give the indel component of the probability of a given sequence alignment as a product of either probabilities of column-to-column transitions or block-wise contributions along the alignment. However, it is not a priori clear how these models are related with any genuine stochastic evolutionary model, which describes the stochastic evolution of an entire sequence along the time-axis. Moreover, none of these models can fully accommodate biologically realistic features, such as overlapping indels, power-law indel-length distributions, and indel rate variation across regions.ResultsHere, we theoretically tackle the ab initio calculation of the probability of a given sequence alignment under a genuine evolutionary model, more specifically, a general continuous-time Markov model of the evolution of an entire sequence via insertions and deletions. Our model allows general indel rate parameters including length distributions but does not impose any unrealistic restrictions on indels. Using techniques of the perturbation theory in physics, we expand the probability into a series over different numbers of indels. Our derivation of this perturbation expansion elegantly bridges the gap between Gillespie’s (1977) intuitive derivation of his own stochastic simulation method, which is now widely used in evolutionary simulators, and Feller’s (1940) mathematically rigorous theorems that underpin Gillespie′s method. We find a sufficient and nearly necessary set of conditions under which the probability can be expressed as the product of an overall factor and the contributions from regions separated by gapless columns of the alignment. The indel models satisfying these conditions include those with some kind of rate variation across regions, as well as space-homogeneous models. We also prove that, though with a caveat, pairwise probabilities calculated by the method of Miklós et al. (2004) are equivalent to those calculated by our ab initio formulation, at least under a space-homogenous model.ConclusionsOur ab initio perturbative formulation provides a firm theoretical ground that other indel models can rest on.[This paper and three other papers (Ezawa, Graur and Landan 2015a,b,c) describe a series of our efforts to develop, apply, and extend the ab initio perturbative formulation of a general continuous-time Markov model of indels.]

Publisher

Cold Spring Harbor Laboratory

Reference81 articles.

1. Empirical and Structural Models for Insertions and Deletions in the Divergent Evolution of Proteins

2. Maximum likelihood alignment of DNA sequences

3. Evolutionary inference via the Poisson Indel Process

4. A note on probabilistic models over strings: The linear algebra approach;Bull Math Biol,2013

5. Transducers: an emerging probabilistic framework for modeling indels on trees

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. General continuous-time Markov model of sequence evolution via insertions/deletions: local alignment probability computation;BMC Bioinformatics;2016-09-27

2. General continuous-time Markov model of sequence evolution via insertions/deletions: are alignment probabilities factorable?;BMC Bioinformatics;2016-09-17

3. Characterization of multiple sequence alignment errors using complete-likelihood score and position-shift map;BMC Bioinformatics;2016-03-18

4. Perturbative formulation of general continuous-time Markov model of sequence evolution via insertions/deletions, Part III: Algorithm for first approximation;2015-07-31

5. Perturbative formulation of general continuous-time Markov model of sequence evolution via insertions/deletions, Part II: Perturbation analyses;2015-07-31