Abstract
AbstractPremiseIn phylogenomic analyses, no consensus exists on whether using single nucleotide polymorphisms (SNPs) or including flanking regions (full ‘locus’) is best, nor how strictly missing data should be filtered. Moreover, empirical evidence on whether SNP-only trees are suitable for downstream phylogenetic comparative methods such as divergence time estimation and ancestral state reconstructions is lacking.MethodsUsing GBS data from 22 taxa ofGlycine, we addressed the effects of SNP vs. locus usage and filtering stringency on phylogenomic inference and phylogenetic comparative methods. We compared branch length, node support, and divergence time estimation across eight datasets with varying amounts of missing data and total size.ResultsOur results reveal five aspects of phylogenomic data usage:1. tree topology is largely congruent regardless of data type or filtering parameters;2. filtering missing data too strictly reduces the confidence in some relationships;3. absolute branch lengths vary by two orders of magnitude between datasets;4. data type and branch length variation have little effect on divergence time estimation;5. phylograms significantly alter the estimation of ancestral states.DiscussionWhen conducting phylogenomic analyses we recommend not to filter datasets too strictly to minimize the risk of misleading topologies, low support, and inaccurate divergence times.
Publisher
Cold Spring Harbor Laboratory