Abstract
AbstractGene tree discordance due to incomplete lineage sorting or introgression has been described in numerous genomic datasets. Among distantly related taxa, however, it is difficult to differentiate these biological sources of discordance from discordance due to errors in gene tree reconstruction, even when supervised machine learning techniques are used to infer individual gene trees. Here, rather than applying machine learning to the problem of inferring single tree topologies, we develop a model to infer important properties of a particular internal branch of the species tree via genome-scale summary statistics extracted from individual alignments and inferred gene trees. We show that our model can effectively predict the presence/absence of discordance, estimate the probability of discordance, and infer the correct species tree topology in the presence of multiple, common sources of error. While gene tree topology counts are the most salient predictors of discordance at short time scales, other genomic features become relevant for distantly related species. We validate our approach through simulation, and apply it to data from the deepest splits among metazoans. Our results suggest that the base of Metazoa experienced significant gene tree discordance, implying that discordant traits among current taxa can be explained without invoking homoplasy. In addition, we find support for Porifera as the sister clade to the rest of Metazoa. Overall, these results demonstrate how machine learning can be used to answer important phylogenetic questions, while marginalizing over individual gene tree—and even species tree—topologies.
Publisher
Cold Spring Harbor Laboratory
Reference50 articles.
1. Modelteller: Model Selection for Optimal Phylogenetic Reconstruction Using Machine Learning;Molecular Biology and Evolution,2020
2. Hemiplasy: A New Term in the Lexicon of Phylogenetics
3. Global population structure and genotyping framework for genomic surveillance of the major dysentery pathogen, Shigella sonnei
4. Chromosome-Scale Inference of Hybrid Speciation and Admixture with Convolutional Neural Networks;In Molecular Ecology Resources,2021
5. Buitinck, Lars , Gilles Louppe , Mathieu Blondel , Fabian Pedregosa , Andreas Mueller , Olivier Grisel , Vlad Niculae , et al. 2013. “API Design for Machine Learning Software: Experiences from the Scikit-Learn Project,” September. http://arxiv.org/abs/1309.0238.
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献