Species Tree Estimation from Gene Trees by Minimizing Deep Coalescence and Maximizing Quartet Consistency: A Comparative Study and the Presence of Pseudo Species Tree Terraces

Author:

Farah Ishrat Tanzila1,Islam Muktadirul2,Zinat Kazi Tasnim13,Rahman Atif Hasan1,Bayzid Shamsuzzoha1

Affiliation:

1. Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh

2. Applied Statistics and Data Science (ASDS), Department of Statistics, Jahangirnagar University, Dhaka 1342, Bangladesh

3. Department of Computer Science, University of Maryland, 8125 Paint Branch Drive, College Park, MD 20742, USA

Abstract

Abstract Species tree estimation from multilocus data sets is extremely challenging, especially in the presence of gene tree heterogeneity across the genome due to incomplete lineage sorting (ILS). Summary methods have been developed which estimate gene trees and then combine the gene trees to estimate a species tree by optimizing various optimization scores. In this study, we have extended and adapted the concept of phylogenetic terraces to species tree estimation by “summarizing” a set of gene trees, where multiple species trees with distinct topologies may have exactly the same optimality score (i.e., quartet score, extra lineage score, etc.). We particularly investigated the presence and impacts of equally optimal trees in species tree estimation from multilocus data using summary methods by taking ILS into account. We analyzed two of the most popular ILS-aware optimization criteria: maximize quartet consistency (MQC) and minimize deep coalescence (MDC). Methods based on MQC are provably statistically consistent, whereas MDC is not a consistent criterion for species tree estimation. We present a comprehensive comparative study of these two optimality criteria. Our experiments, on a collection of data sets simulated under ILS, indicate that MDC may result in competitive or identical quartet consistency score as MQC, but could be significantly worse than MQC in terms of tree accuracy—demonstrating the presence and impacts of equally optimal species trees. This is the first known study that provides the conditions for the data sets to have equally optimal trees in the context of phylogenomic inference using summary methods. [Gene tree; incomplete lineage sorting; phylogenomic analysis, species tree; summary method.]

Funder

Communication Technology Division

Government of the People’s Republic of Bangladesh

Publisher

Oxford University Press (OUP)

Subject

Genetics,Ecology, Evolution, Behavior and Systematics

Reference77 articles.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3