Clustering under approximation stability

Author:

Balcan Maria-Florina1,Blum Avrim2,Gupta Anupam2

Affiliation:

1. Georgia Institute of Technology, Atlanta, GA

2. Carnegie Mellon University, Pittsburgh, PA

Abstract

A common approach to clustering data is to view data objects as points in a metric space, and then to optimize a natural distance-based objective such as the k -median, k -means, or min-sum score. For applications such as clustering proteins by function or clustering images by subject, the implicit hope in taking this approach is that the optimal solution for the chosen objective will closely match the desired “target” clustering (e.g., a correct clustering of proteins by function or of images by who is in them). However, most distance-based objectives, including those mentioned here, are NP-hard to optimize. So, this assumption by itself is not sufficient, assuming P ≠ NP, to achieve clusterings of low-error via polynomial time algorithms. In this article, we show that we can bypass this barrier if we slightly extend this assumption to ask that for some small constant c , not only the optimal solution, but also all c -approximations to the optimal solution, differ from the target on at most some ϵ fraction of points—we call this (c,ϵ)-approximation-stability . We show that under this condition, it is possible to efficiently obtain low-error clusterings even if the property holds only for values c for which the objective is known to be NP-hard to approximate. Specifically, for any constant c > 1, (c,ϵ) -approximation-stability of k -median or k -means objectives can be used to efficiently produce a clustering of error O (ϵ) with respect to the target clustering, as can stability of the min-sum objective if the target clusters are sufficiently large. Thus, we can perform nearly as well in terms of agreement with the target clustering as if we could approximate these objectives to this NP-hard value.

Funder

Division of Computing and Communication Foundations

Microsoft Research

Google

Publisher

Association for Computing Machinery (ACM)

Subject

Artificial Intelligence,Hardware and Architecture,Information Systems,Control and Systems Engineering,Software

Cited by 47 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Strategyproof Facility Location in Perturbation Stable Instances;Web and Internet Economics;2022

2. k -center Clustering under Perturbation Resilience;ACM Transactions on Algorithms;2020-04-27

3. Index;Foundations of Data Science;2020-01-23

4. Background Material;Foundations of Data Science;2020-01-23

5. Wavelets;Foundations of Data Science;2020-01-23

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3