Weighted Centroid Trees: A general approach for summarizing phylogenies in tumor mutation tree inference

Author:

Vasei Hamed,Foroughmand Araabi Mohammad HadiORCID,Daneshgar Amir

Abstract

AbstractTumor mutation trees are the primary tools to model the evolution of cancer. Not only some tumor phylogeny inference methods may produce a set of trees having potential and parallel evolutionary histories, but also mutation trees from different patients may also exhibit similar evolutionary processes. When a set of correlated mutation trees is available, compressing the data into a single best-fit tree, exhibiting the shared evolutionary processes, is definitely of great importance and can be beneficial in many applications. In this study, we present a general setup to study and analyse the problem of finding a best-fit (centroid) tree to a given set of trees and we use our general setup to analyse mutation trees as our main motivation. For this letε:𝒯n→ ℝn×nbe an embedding of labeled rooted trees into the space of real square matrices and also letLbe a norm on this space. We introduce thenearest mapped treeproblem as the problem of finding a closest tree to a given matrixAwith respect toεandL, i.e., a treeT*(A) for whichL(ε(T*(A)) −A) is minimized. Within this setup, our potential candidates for the embedding areadjacency, ancestry, anddistancematrices of trees, where we consider the cases ofL1andL2norms in our analysis. We show that the function d(T1,T2) =L(ε(T1) −ε(T2)) defines a family of dissimilarity measures, covering previously studiedparent-childandancestor-descendentmetrics. Also, we show that the nearest mapped tree problem is polynomial-time solvable for the adjacency matrix embedding and is𝒩𝒫-hard for the ancestry and the distance embeddings. Theweighted centroid tree problemfor a given set of trees of sizekis naturally defined as a nearest mapped tree solution to a weighted sum of the corresponding matrix set. In this article we consider uniform weighted-sums for which all weights are equal, where in particular, the (classical)centroid treeis defined to be a solution when all weights are chosen to be equal to 1/k(i.e., the mean case). Similarly, theω-weighted centroid tree is a solution when all weights are equal toω/k. To show the generality of our setup, we prove that the solution-set of the centroid tree problem for the adjacency and the ancestry matrices are identical to the solution-set of theconsensus tree problemfor parent-child and ancestor-descendent distances already handled by the algorithms GraPhyC(2018) and TuELiP(2023), respectively. Next, to tackle this problem for some new cases, we provide integer linear programs to handle the nearest mapped tree problem for the ancestry and the distance embeddings, giving rise to solutions of the weighted centroid tree problem in these cases. To show the effectiveness of this approach, we provide an algorithm,WAncILP2, to solvethe 2-weighted centroid tree problem for the case of the ancestry matrix and we justify the importance of the weighted setup by showing the pioneering performance ofWAncILP2both in a comprehensive simulation analysis as well as on a real breast cancer dataset, in which, by finding the centroids as representatives of data clusters, we provide supporting evidence for the fact that some common aspects of these centroids can be considered as suitable candidates for reliable evolutionary information in relation to the original data. metrics.

Publisher

Cold Spring Harbor Laboratory

Reference48 articles.

1. XPO1-dependent nuclear export as a target for cancer therapy;In: Journal of Hematology & Oncology,2020

2. Summarizing the solution space in tumor phylogeny inference by multiple consensus trees;In: Bioinformatics,2019

3. The molecular biology of the Notch locus and the fine tuning of differentiation in Drosophila;In: Trends in Genetics,1988

4. On Two Measures of Distance between Fully-Labelled Trees;In: Leibniz International Proceedings in Informatics, LIPIcs,2020

5. Giulia Bernardini et al. “A rearrangement distance for fully-labelled trees”. In: Leibniz International Proceedings in Informatics, LIPIcs 128.23 (2019). arXiv: 1904.01321.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3