Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor

Author:

Cohen-Addad Vincent1,Das Debarati2,Kipouridis Evangelos2,Parotsidis Nikos1,Thorup Mikkel2

Affiliation:

1. Google Research, Switzerland

2. Department of Computer Science, University of Copenhagen, Denmark

Abstract

We consider the numerical taxonomy problem of fitting a positive distance function \({\mathcal {D}:{S\choose 2}\rightarrow \mathbb {R}_{\gt 0}} \) by a tree metric. We want a tree T with positive edge weights and including S among the vertices so that their distances in T match those in \(\mathcal {D} \) . A nice application is in evolutionary biology where the tree T aims to approximate the branching process leading to the observed distances in \(\mathcal {D} \) [Cavalli-Sforza and Edwards 1967]. We consider the total error, that is the sum of distance errors over all pairs of points. We present a deterministic polynomial time algorithm minimizing the total error within a constant factor. We can do this both for general trees, and for the special case of ultrametrics with a root having the same distance to all vertices in S . The problems are APX-hard, so a constant factor is the best we can hope for in polynomial time. The best previous approximation factor was O ((log  n )(log log  n )) by Ailon and Charikar [2005] who wrote “Determining whether an O (1) approximation can be obtained is a fascinating question”.

Publisher

Association for Computing Machinery (ACM)

Subject

Artificial Intelligence,Hardware and Architecture,Information Systems,Control and Systems Engineering,Software

Reference55 articles.

1. Amir Abboud Vincent Cohen-Addad and Hussein Houdrouge. 2019. Subquadratic High-Dimensional Hierarchical Clustering. In NeurIPS. 11576–11586. Amir Abboud Vincent Cohen-Addad and Hussein Houdrouge. 2019. Subquadratic High-Dimensional Hierarchical Clustering. In NeurIPS. 11576–11586.

2. On the Approximability of Numerical Taxonomy (Fitting Distances by Tree Metrics)

3. Fitting Tree Metrics: Hierarchical Clustering and Phylogeny

4. Aggregating inconsistent information

5. Noga Alon , Yossi Azar , and Danny Vainstein . 2020 . Hierarchical Clustering: A 0.585 Revenue Approximation. In COLT, Vol.  125. 153–162. Noga Alon, Yossi Azar, and Danny Vainstein. 2020. Hierarchical Clustering: A 0.585 Revenue Approximation. In COLT, Vol.  125. 153–162.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3