Perturbative formulation of general continuous-time Markov model of sequence evolution via insertions/deletions, Part I: Theoretical basis

Author:

Ezawa KiyoshiORCID,Graur Dan,Landan Giddy

Abstract

AbstractBackgroundInsertions and deletions (indels) account for more nucleotide differences between two related DNA sequences than substitutions do, and thus it is imperative to develop a stochastic evolutionary model that enables us to reliably calculate the probability of the sequence evolution through indel processes. Recently, such probabilistic models are mostly based on either hidden Markov models (HMMs) or transducer theories, both of which give the indel component of the probability of a given sequence alignment as a product of either probabilities of column-to-column transitions or block-wise contributions along the alignment. However, it is not a priori clear how these models are related with any genuine stochastic evolutionary model, which describes the stochastic evolution of an entire sequence along the time-axis. Moreover, none of these models can fully accommodate biologically realistic features, such as overlapping indels, power-law indel-length distributions, and indel rate variation across regions.ResultsHere, we theoretically tackle the ab initio calculation of the probability of a given sequence alignment under a genuine evolutionary model, more specifically, a general continuous-time Markov model of the evolution of an entire sequence via insertions and deletions. Our model allows general indel rate parameters including length distributions but does not impose any unrealistic restrictions on indels. Using techniques of the perturbation theory in physics, we expand the probability into a series over different numbers of indels. Our derivation of this perturbation expansion elegantly bridges the gap between Gillespie’s (1977) intuitive derivation of his own stochastic simulation method, which is now widely used in evolutionary simulators, and Feller’s (1940) mathematically rigorous theorems that underpin Gillespie′s method. We find a sufficient and nearly necessary set of conditions under which the probability can be expressed as the product of an overall factor and the contributions from regions separated by gapless columns of the alignment. The indel models satisfying these conditions include those with some kind of rate variation across regions, as well as space-homogeneous models. We also prove that, though with a caveat, pairwise probabilities calculated by the method of Miklós et al. (2004) are equivalent to those calculated by our ab initio formulation, at least under a space-homogenous model.ConclusionsOur ab initio perturbative formulation provides a firm theoretical ground that other indel models can rest on.[This paper and three other papers (Ezawa, Graur and Landan 2015a,b,c) describe a series of our efforts to develop, apply, and extend the ab initio perturbative formulation of a general continuous-time Markov model of indels.]

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3