Spectral top-down recovery of latent tree models

Author:

Aizenbud Yariv1,Jaffe Ariel1,Wang Meng2,Hu Amber1,Amsel Noah1,Nadler Boaz3,Chang Joseph T4,Kluger Yuval125

Affiliation:

1. Program in Applied Mathematics, Yale University , New Haven, CT 06511 , USA

2. Department of Pathology, Yale University, New Haven , CT 06511 , USA

3. Department of Computer Science, Weizmann Institute of Science , Rehovot 76100 , Israel

4. Department of Statistics, Yale University , New Haven, CT 06520 , USA

5. Interdepartmental Program in Computational Biology and Bioinformatics, Yale University , New Haven, CT 06511 , USA

Abstract

AbstractModeling the distribution of high-dimensional data by a latent tree graphical model is a prevalent approach in multiple scientific domains. A common task is to infer the underlying tree structure, given only observations of its terminal nodes. Many algorithms for tree recovery are computationally intensive, which limits their applicability to trees of moderate size. For large trees, a common approach, termed divide-and-conquer, is to recover the tree structure in two steps. First, separately recover the structure of multiple, possibly random subsets of the terminal nodes. Second, merge the resulting subtrees to form a full tree. Here, we develop spectral top-down recovery (STDR), a deterministic divide-and-conquer approach to infer large latent tree models. Unlike previous methods, STDR partitions the terminal nodes in a non random way, based on the Fiedler vector of a suitable Laplacian matrix related to the observed nodes. We prove that under certain conditions, this partitioning is consistent with the tree structure. This, in turn, leads to a significantly simpler merging procedure of the small subtrees. We prove that STDR is statistically consistent and bound the number of samples required to accurately recover the tree with high probability. Using simulated data from several common tree models in phylogenetics, we demonstrate that STDR has a significant advantage in terms of runtime, with improved or similar accuracy.

Funder

National Institutes of Health

Isreal Science Foundation

Publisher

Oxford University Press (OUP)

Subject

Applied Mathematics,Computational Theory and Mathematics,Numerical Analysis,Statistics and Probability,Analysis

Reference65 articles.

1. Matrix decompositions using sub-Gaussian random matrices;Aizenbud;Inform. Inference: J. IMA,2019

2. On the solution of linear recurrence equations;Akra;Comput. Optim. Appl.,1998

3. Molecular phylogenetics from an algebraic viewpoint;Allman;Statist. Sinica,2007

4. Learning mixtures of tree graphical models;Anandkumar;Adv. Neural Inform. Process. Syst.,2012

5. The performance of neighbor-joining methods of phylogenetic reconstruction;Atteson;Algorithmica,1999

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3