Author:
Arutyunova Anna,Großwendt Anna,Röglin Heiko,Schmidt Melanie,Wargalla Julian
Abstract
AbstractIn a hierarchical clustering problem the task is to compute a series of mutually compatible clusterings of a finite metric space $$(P,{{\,\textrm{dist}\,}})$$
(
P
,
dist
)
. Starting with the clustering where every point forms its own cluster, one iteratively merges two clusters until only one cluster remains. Complete linkage is a well-known and popular algorithm to compute such clusterings: in every step it merges the two clusters whose union has the smallest radius (or diameter) among all currently possible merges. We prove that the radius (or diameter) of every k-clustering computed by complete linkage is at most by factor O(k) (or $$O(k^{\ln (3)/\ln (2)})=O(k^{1{.}59})$$
O
(
k
ln
(
3
)
/
ln
(
2
)
)
=
O
(
k
1.59
)
) worse than an optimal k-clustering minimizing the radius (or diameter). Furthermore we give a negative answer to the question proposed by Dasgupta and Long (J Comput Syst Sci 70(4):555–569, 2005. https://doi.org/10.1016/j.jcss.2004.10.006), who show a lower bound of $$\Omega (\log (k))$$
Ω
(
log
(
k
)
)
and ask if the approximation guarantee is in fact $$\Theta (\log (k))$$
Θ
(
log
(
k
)
)
. We present instances where complete linkage performs poorly in the sense that the k-clustering computed by complete linkage is off by a factor of $$\Omega (k)$$
Ω
(
k
)
from an optimal solution for radius and diameter. We conclude that in general metric spaces complete linkage does not perform asymptotically better than single linkage, merging the two clusters with smallest inter-cluster distance, for which we prove an approximation guarantee of O(k).
Funder
Deutsche Forschungsgemeinschaft
Rheinische Friedrich-Wilhelms-Universität Bonn
Publisher
Springer Science and Business Media LLC
Subject
Artificial Intelligence,Software