Affiliation:
1. Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
Abstract
Abstract
Motivation
Species tree estimation is a basic part of biological research but can be challenging because of gene duplication and loss (GDL), which results in genes that can appear more than once in a given genome. All common approaches in phylogenomic studies either reduce available data or are error-prone, and thus, scalable methods that do not discard data and have high accuracy on large heterogeneous datasets are needed.
Results
We present FastMulRFS, a polynomial-time method for estimating species trees without knowledge of orthology. We prove that FastMulRFS is statistically consistent under a generic model of GDL when adversarial GDL does not occur. Our extensive simulation study shows that FastMulRFS matches the accuracy of MulRF (which tries to solve the same optimization problem) and has better accuracy than prior methods, including ASTRAL-multi (the only method to date that has been proven statistically consistent under GDL), while being much faster than both methods.
Availability and impementation
FastMulRFS is available on Github (https://github.com/ekmolloy/fastmulrfs).
Supplementary information
Supplementary data are available at Bioinformatics online.
Funder
National Science Foundation
NSF
Ira and Debra Cohen Graduate Fellowship in Computer Science
Illinois Campus Cluster
National Center for Supercomputing Applications
Publisher
Oxford University Press (OUP)
Subject
Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability
Cited by
32 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献