Abstract
AbstractModel selection aims to choose the most adequate model for the statistical analysis at hand. The model must be complex enough to capture the complexity of the data but should be simple enough to not overfit. In phylogenetics, the most common model selection scenario concerns selecting an appropriate substitution and partition model for sequence evolution to infer a phylogenetic tree. Here we explored the impact of substitution model over-parameterization in a Bayesian statistical framework. We performed simulations under the simplest substitution model, the Jukes-Cantor model, and compare posterior estimates of phylogenetic tree topologies and tree length under the true model to the most complex model, the GTR+Γ+I substitution model, including over-splitting the data into additional subsets (i.e., applying partitioned models). We explored four choices of prior distributions: the default substitution model priors of MrBayes, BEAST2 and RevBayes and a newly devised prior choice (Tame). Our results show that Bayesian inference of phylogeny is robust to substitution model over-parameterization but only under our new prior settings. All three default priors introduced biases for the estimated tree length. We conclude that substitution and partition model selection are superfluous steps in Bayesian phylogenetic inference pipelines if well behaved prior distributions are applied.
Publisher
Cold Spring Harbor Laboratory
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献