Abstract
AbstractSpecies tree inference under the multi-species coalescent (MSC) model is a basic step in biological discovery. Despite the developments in recent years of methods that are proven statistically consistent and that have high accuracy, large datasets create computational challenges. Although there is generally some information available about the species trees that could be used to speed up the estimation, only one method–ASTRAL-J, a recent development in the ASTRAL family of methods–is able to use this information. Here we describe two new methods, NJst-J and FASTRAL-J, that can estimate the species tree given partial knowledge of the species tree in the form of a non-binary unrooted constraint tree.. We show that both NJst-J and FASTRAL-J are much faster than ASTRAL-J and we prove that all three methods are statistically consistent under the multi-species coalescent model subject to this constraint. Our extensive simulation study shows that both FASTRAL-J and NJst-J provide advantages over ASTRAL-J: both are faster (and NJst-J is particularly fast), and FASTRAL-J is generally at least as accurate as ASTRAL-J. An analysis of the Avian Phylogenomics project dataset with 48 species and 14,446 genes presents additional evidence of the value of FASTRAL-J over ASTRAL-J (and both over ASTRAL), with dramatic reductions in running time (20 hours for default ASTRAL, and minutes or seconds for ASTRAL-J and FASTRAL-J, respectively).AvailabilityFASTRAL-J and NJst-J are available in open source form at https://github.com/RuneBlaze/FASTRAL-constrained and https://github.com/RuneBlaze/NJst-constrained. Locations of the datasets used in this study and detailed commands needed to reproduce the study are provided in the supplementary materials at http://tandy.cs.illinois.edu/baqiao-suppl.pdf.
Publisher
Cold Spring Harbor Laboratory