Randomized near-neighbor graphs, giant components and applications in data science
-
Published:2020-06
Issue:2
Volume:57
Page:458-476
-
ISSN:0021-9002
-
Container-title:Journal of Applied Probability
-
language:en
-
Short-container-title:J. Appl. Probab.
Author:
Jaffe Ariel,Kluger Yuval,Linderman George C.,Mishne Gal,Steinerberger Stefan
Abstract
AbstractIf we pick n random points uniformly in
$[0,1]^d$
and connect each point to its
$c_d \log{n}$
nearest neighbors, where
$d\ge 2$
is the dimension and
$c_d$
is a constant depending on the dimension, then it is well known that the graph is connected with high probability. We prove that it suffices to connect every point to
$ c_{d,1} \log{\log{n}}$
points chosen randomly among its
$ c_{d,2} \log{n}$
nearest neighbors to ensure a giant component of size
$n - o(n)$
with high probability. This construction yields a much sparser random graph with
$\sim n \log\log{n}$
instead of
$\sim n \log{n}$
edges that has comparable connectivity properties. This result has non-trivial implications for problems in data science where an affinity matrix is constructed: instead of connecting each point to its k nearest neighbors, one can often pick
$k'\ll k$
random points out of the k nearest neighbors and only connect to those without sacrificing quality of results. This approach can simplify and accelerate computation; we illustrate this with experimental results in spectral clustering of large-scale datasets.
Publisher
Cambridge University Press (CUP)
Subject
Statistics, Probability and Uncertainty,General Mathematics,Statistics and Probability
Reference46 articles.
1. [46] Yukich, J. (1998). Probability Theory of Classical Euclidean Optimization Problems (Lecture Notes Math. 1675). Springer, Berlin.
2. A tutorial on spectral clustering
3. Visualizing data using t-SNE;van der Maaten;J. Machine Learning Res.,2008
4. Accelerating t-SNE using tree-based algorithms;van der Maaten;J. Machine Learning Res.,2014
5. k-Nearest-Neighbor Clustering and Percolation Theory
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献