Affiliation:
1. University of Paderborn, Paderborn, Germany
2. Technische Universität Dortmund, Dortmund, Germany
Abstract
We study a generalization of the
k
-median problem with respect to an arbitrary dissimilarity measure D. Given a finite set
P
of size
n
, our goal is to find a set
C
of size
k
such that the sum of errors D(
P,C
) = ∑
p
∈
P
min
c
∈
C
{D(
p,c
)} is minimized. The main result in this article can be stated as follows: There exists a (1+ϵ)-approximation algorithm for the
k
-median problem with respect to D, if the 1-median problem can be approximated within a factor of (1+ϵ) by taking a random sample of constant size and solving the 1-median problem on the sample exactly. This algorithm requires time
n
2
O
(
mk
log(
mk
/ϵ)), where
m
is a constant that depends only on ϵ and D. Using this characterization, we obtain the first linear time (1+ϵ)-approximation algorithms for the
k
-median problem in an arbitrary metric space with bounded doubling dimension, for the Kullback-Leibler divergence (relative entropy), for the Itakura-Saito divergence, for Mahalanobis distances, and for some special cases of Bregman divergences. Moreover, we obtain previously known results for the Euclidean
k
-median problem and the Euclidean
k
-means problem in a simplified manner. Our results are based on a new analysis of an algorithm of Kumar et al. [2004].
Funder
Deutsche Forschungsgemeinschaft
Publisher
Association for Computing Machinery (ACM)
Subject
Mathematics (miscellaneous)
Cited by
51 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献