Abstract
AbstractMost phylogenetic analyses assume that a single evolutionary history underlies one gene. However, both biological processes and errors in dataset assembly can violate this assumption causing intragenic conflict. The extent to which this conflict is present in empirical datasets is not well documented. However, if common, it would have far-reaching implications for phylogenetic analyses. Here, we examined several large phylogenomic datasets from diverse taxa using a fast and simple method to identify well supported intragenic conflict. We found conflict to be highly variable between datasets, from 1% to more than 92% of genes investigated. To better characterize patterns of conflict, we analyzed four genes with no obvious data assembly errors in more detail. Analyses on simulated data highlighted that alignment error may be one major source of conflict. Whether as part of data analysis pipelines or in order to explore potential biologically compelling intragenic processes, analyses of within gene signal should become common. The method presented here provides a relatively fast means for identifying conflicts that is agnostic to the generating process. Datasets identified with high intragenic conflict may either have significant errors in dataset assembly or represent conflict generated by biological processes. Conflicts that are the result of error should be identified and discarded or corrected. For those conflicts that are the result of biological processes, these analyses contribute to the growing consensus that, similar to genomes, genes themselves may exhibit multiple conflicting evolutionary histories across the tree of life.
Publisher
Cold Spring Harbor Laboratory
Reference45 articles.
1. Split Scores: A Tool to Quantify Phylogenetic Signal in Genome-Scale Data
2. Detecting Phylogenetic Breakpoints and Discordance from Genome-Wide Alignments for Species Tree Reconstruction
3. Effect of Recombination on the Accuracy of the Likelihood Method for Detecting Positive Selection at Amino Acid Sites;Genetics,2003
4. A mixture model and a hidden markov model to simultaneously detect recombination breakpoints and reconstruct phylogenies;Evol Bioinform Online,2009
5. Genome-scale coestimation of species and gene trees