Affiliation:
1. Aalto University, Finland
Abstract
We study the problem of detecting
outlier pairs
of strongly correlated variables among a collection of
n
variables with otherwise weak pairwise correlations. After normalization, this task amounts to the geometric task where we are given as input a set of
n
vectors with unit Euclidean norm and dimension
d
, and for some constants 0<τ < ρ < 1, we are asked to find all the outlier pairs of vectors whose inner product is at least ρ in absolute value, subject to the promise that all but at most
q
pairs of vectors have inner product at most τ in absolute value.
Improving on an algorithm of Valiant [FOCS 2012; J. ACM 2015], we present a randomized algorithm that for Boolean inputs ({ −1,1}-valued data normalized to unit Euclidean length) runs in time Õ((
n
max,{ 1−γ +
M
(Δ γ ,γ),
M
(1−γ ,2 Δ γ)}
+
qdn
2γ
), where 0<γ < 1 is a constant tradeoff parameter and
M
(μ, ν) is the exponent to multiply an ⌊
n
μ
⌋ × ⌊
n
ν
⌋ matrix with an ⌊
n
ν
⌋ × ⌊
n
μ
⌋ matrix and Δ =1/(1−log
τ
ρ). As corollaries we obtain randomized algorithms that run in time Õ( (
n
2/ω 3−log
τ
ρ
+
qdn
2/(1−log
τ
ρ)3=log
ττ
ρ
) and in time õ( (
n
4 / 2+α (1−log
τ
ρ)
+
qdn
2/α (1−log
τ
ρ)2+α (1−log
τ
ρ)
>), where 2≤ ω <2.38 is the exponent for square matrix multiplication and 0.3<α ≤ 1 is the exponent for rectangular matrix multiplication. The notation Õ(ṡ) hides polylogarithmic factors in
n
and
d
whose degree may depend on ρ and τ. We present further corollaries for the light bulb problem and for learning sparse Boolean functions.
Funder
European Research Council under the European Union's Seventh Framework Programme
ERC
Publisher
Association for Computing Machinery (ACM)
Subject
Mathematics (miscellaneous)
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献