Building alternative consensus trees and supertrees using k-means and Robinson and Foulds distance

Author:

Tahiri Nadia12,Fichet Bernard3,Makarenkov Vladimir1ORCID

Affiliation:

1. Département d’informatique, Université du Québec à Montréal , Montreal, QC H2X 3Y7, Canada

2. Département d’informatique, Université de Sherbrooke , Sherbrooke, QC J1K 2X9, Canada

3. Faculté de Médecine, Aix-Marseille Université , Marseille F-13385, France

Abstract

Abstract Motivation Each gene has its own evolutionary history which can substantially differ from evolutionary histories of other genes. For example, some individual genes or operons can be affected by specific horizontal gene transfer or recombination events. Thus, the evolutionary history of each gene should be represented by its own phylogenetic tree which may display different evolutionary patterns from the species tree that accounts for the main patterns of vertical descent. However, the output of traditional consensus tree or supertree inference methods is a unique consensus tree or supertree. Results We present a new efficient method for inferring multiple alternative consensus trees and supertrees to best represent the most important evolutionary patterns of a given set of gene phylogenies. We show how an adapted version of the popular k-means clustering algorithm, based on some remarkable properties of the Robinson and Foulds distance, can be used to partition a given set of trees into one (for homogeneous data) or multiple (for heterogeneous data) cluster(s) of trees. Moreover, we adapt the popular Caliński–Harabasz, Silhouette, Ball and Hall, and Gap cluster validity indices to tree clustering with k-means. Special attention is given to the relevant but very challenging problem of inferring alternative supertrees. The use of the Euclidean property of the objective function of the method makes it faster than the existing tree clustering techniques, and thus better suited for analyzing large evolutionary datasets. Availability and implementation Our KMeansSuperTreeClustering program along with its C++ source code is available at: https://github.com/TahiriNadia/KMeansSuperTreeClustering. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

Natural Sciences and Engineering Research Council of Canada

Fonds de Recherche sur la Santé of Québec and Fonds de Recherche sur la Nature et Technologies of Québec

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Reference55 articles.

1. A clustering technique for summarizing multivariate data;Ball;Behav. Sci,1967

2. Robinson-Foulds supertrees;Bansal;Algorithms Mol. Biol,2010

3. Phylogenetic reconstruction and lateral gene transfer;Bapteste;Trends Microbiol,2004

4. The median procedure for n-trees;Barthélemy;J. Classif,1986

5. Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees;Baum;Taxon,1992

Cited by 5 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3