GeneRax: A tool for species tree-aware maximum likelihood based gene family tree inference under gene duplication, transfer, and loss

Author:

Morel BenoitORCID,Kozlov Alexey M.ORCID,Stamatakis AlexandrosORCID,Szöllősi Gergely J.ORCID

Abstract

AbstractInferring phylogenetic trees for individual homologous gene families is difficult because alignments are often too short, and thus contain insufficient signal, while substitution models inevitably fail to capture the complexity of the evolutionary processes. To overcome these challenges species tree-aware methods also leverage information from a putative species tree. However, only few methods are available that implement a full likelihood framework or account for horizontal gene transfers. Furthermore, these methods often require expensive data pre-processing (e.g., computing bootstrap trees), and rely on approximations and heuristics that limit the degree of tree space exploration. Here we present GeneRax, the first maximum likelihood species tree-aware phylogenetic inference software. It simultaneously accounts for substitutions at the sequence level as well as gene level events, such as duplication, transfer, and loss relying on established maximum likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for multiple gene families, directly from the per-gene sequence alignments and a rooted, yet undated, species tree. We show that compared to competing tools, on simulated data GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms of relative Robinson-Foulds distance. On empirical datasets, GeneRax is the fastest among all tested methods when starting from aligned sequences, and it infers trees with the highest likelihood score, based on our model. GeneRax completed tree inferences and reconciliations for 1099 Cyanobacteria families in eight minutes on 512 CPU cores. Thus, its parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax.

Publisher

Cold Spring Harbor Laboratory

Reference36 articles.

1. Simultaneous Bayesian gene tree reconstruction and reconciliation analysis

2. Boussau, B. , Szöllősi, G. J. , Duret, L. , Gouy, M. , Tannier, E. , Daubin, V. , Lyon, U. D. , and Lyon, U. 2012. Genome-scale coestimation of species and gene trees. Life Sciences, pages 1–27.

3. Inferring incomplete lineage sorting, duplications, transfers and losses with reconciliations;Journal of Theoretical Biology,2017

4. Notung: A program for dating gene duplications and optimizing gene family trees;Journal of computational biology: a journal of computational molecular cell biology,2000

5. Comte, N. , Morel, B. , Hasic, D. , Guéguen, L. , Boussau, B. , Daubin, V. , Scornavacca, C. , Gouy, M. , Stamatakis, A. , Tannier, E. , and Parsons, D. 2018. Treerecs. https://gitlab.inria.fr/Phylophile/Treerecs/tree/pll-integration.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3