The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life

Author:

Du Yan,Wu Shaoyuan,Edwards Scott V.,Liu LiangORCID

Abstract

Abstract Background The flood of genomic data to help build and date the tree of life requires automation at several critical junctures, most importantly during sequence assembly and alignment. It is widely appreciated that automated alignment protocols can yield inaccuracies, but the relative impact of various sources error on phylogenomic analysis is not yet known. This study employs an updated mammal data set of 5162 coding loci sampled from 90 species to evaluate the effects of alignment uncertainty, substitution models, and fossil priors on gene tree, species tree, and divergence time estimation. Additionally, a novel coalescent likelihood ratio test is introduced for comparing competing species trees against a given set of gene trees. Results The aligned DNA sequences of 5162 loci from 90 species were trimmed and filtered using trimAL and two filtering protocols. The final dataset contains 4 sets of alignments - before trimming, after trimming, filtered by a recently proposed pipeline, and further filtered by comparing ML gene trees for each locus with the concatenation tree. Our analyses suggest that the average discordance among the coalescent trees is significantly smaller than that among the concatenation trees estimated from the 4 sets of alignments or with different substitution models. There is no significant difference among the divergence times estimated with different substitution models. However, the divergence dates estimated from the alignments after trimming are more recent than those estimated from the alignments before trimming. Conclusions Our results highlight that alignment uncertainty of the updated mammal data set and the choice of substitution models have little impact on tree topologies yielded by coalescent methods for species tree estimation, whereas they are more influential on the trees made by concatenation. Given the choice of calibration scheme and clock models, divergence time estimates are robust to the choice of substitution models, but removing alignments deemed problematic by trimming algorithms can lead to more recent dates. Although the fossil prior is important in divergence time estimation, Bayesian estimates of divergence times in this data set are driven primarily by the sequence data.

Funder

National Science Foundation

National Natural Science Foundation of China

Publisher

Springer Science and Business Media LLC

Subject

Ecology, Evolution, Behavior and Systematics

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3