Author:
Katriel Guy,Mahanaymi Udi,Koutschan Christoph,Zeilberger Doron,Steel Mike,Snir Sagi
Abstract
AbstractThe genomic era has opened up vast opportunities in molecular systematics, one of which is deciphering the evolutionary history in fine detail. Under this mass of data, analysing the point mutations of standard markers is too crude and slow for fine-scale phylogenetics. Nevertheless, genome dynamics events provide far richer information. The synteny index (SI) between a pair of genomes combines gene order and gene content information, allowing the comparison of genomes of unequal gene content, together with order considerations of their common genes. Recently, genome dynamics has been modelled as a continuous-time Markov process, and gene distance in the genome as a birth–death–immigration process. Nevertheless, due to complexities arising in this setting such as overlapping neighbourhoods and other confounding factors, no precise and provably consistent estimators could be derived.Here, we extend this modelling approach by using techniques from birth–death theory to derive explicit expressions of the system’s probabilistic dynamics in the form of rational functions of the model parameters. This, in turn, allows us to infer analytically the expected distances between organisms based on a transformation of their SI. Despite the complexity of the expressions obtained, we establish additivity of this estimated evolutionary distance (a desirable property yielding phylogenetic consistency).Applying the new measure in simulation studies shows that it attains very accurate results in realistic settings and even under model extensions. In the real-data realm, we applied the new formulation to unique data structure that we constructed - the ordered orthology DB - based on a new version of the EggNOG database, to construct a tree with more than 4.5K taxa. The resulted tree was compared it with a NCBI taxonomy for these organisms. To the best of our knowledge, this is the largest gene-order-based tree constructed and it overcomes flaws found in previous approaches.
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献