Counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models

Author:

Chauve CedricORCID,Ponty YannORCID,Wallner MichaelORCID

Abstract

AbstractGiven a set of species whose evolution is represented by a species tree, a gene family is a group of genes having evolved from a single ancestral gene. A gene family evolves along the branches of a species tree through various mechanisms, including—but not limited to—speciation ($${\mathbb {S}}$$S), gene duplication ($${\mathbb {D}}$$D), gene loss ($${\mathbb {L}}$$L), and horizontal gene transfer ($${\mathbb {T}}$$T). The reconstruction of a gene tree representing the evolution of a gene family constrained by a species tree is an important problem in phylogenomics. However, unlike in the multispecies coalescent evolutionary model that considers only speciation and incomplete lineage sorting events, very little is known about the search space for gene family histories accounting for gene duplication, gene loss and horizontal gene transfer (the $${\mathbb {D}}{\mathbb {L}}{\mathbb {T}}$$DLT-model). In this work, we introduce the notion of evolutionary histories defined as a binary ordered rooted tree describing the evolution of a gene family, constrained by a species tree in the $${\mathbb {D}}{\mathbb {L}}{\mathbb {T}}$$DLT-model. We provide formal grammars describing the set of all evolutionary histories that are compatible with a given species tree, whether it is ranked or unranked. These grammars allow us, using either analytic combinatorics or dynamic programming, to efficiently compute the number of histories of a given size, and also to generate random histories of a given size under the uniform distribution. We apply these tools to obtain exact asymptotics for the number of gene family histories for two species trees, the rooted caterpillar and complete binary tree, as well as estimates of the range of the exponential growth factor of the number of histories for random species trees of size up to 25. Our results show that including horizontal gene transfers induce a dramatic increase of the number of evolutionary histories. We also show that, within ranked species trees, the number of evolutionary histories in the $${\mathbb {D}}{\mathbb {L}}{\mathbb {T}}$$DLT-model is almost independent of the species tree topology. These results establish firm foundations for the development of ensemble methods for the prediction of reconciliations.

Funder

Austrian Science Fund

NSERC

Austrian Federal Ministry of Education, Science and Research

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Agricultural and Biological Sciences (miscellaneous),Modeling and Simulation

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Tuning as convex optimisation: a polynomial tuner for multi-parametric combinatorial samplers;Combinatorics, Probability and Computing;2021-12-15

2. Microbial-driven genetic variation in holobionts;FEMS Microbiology Reviews;2021-04-30

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3