Randomized near-neighbor graphs, giant components and applications in data science-Reference-Cited by-同舟云学术

Randomized near-neighbor graphs, giant components and applications in data science

Published:2020-06 Issue:2 Volume:57 Page:458-476
ISSN:0021-9002
Container-title:Journal of Applied Probability
language:en
Short-container-title:J. Appl. Probab.

Author:

Jaffe Ariel,Kluger Yuval,Linderman George C.,Mishne Gal,Steinerberger Stefan

Abstract

AbstractIf we pick n random points uniformly in

$[0,1]^d$

and connect each point to its

$c_d \log{n}$

nearest neighbors, where

$d\ge 2$

is the dimension and

$c_d$

is a constant depending on the dimension, then it is well known that the graph is connected with high probability. We prove that it suffices to connect every point to

$ c_{d,1} \log{\log{n}}$

points chosen randomly among its

$ c_{d,2} \log{n}$

nearest neighbors to ensure a giant component of size

$n - o(n)$

with high probability. This construction yields a much sparser random graph with

$\sim n \log\log{n}$

instead of

$\sim n \log{n}$

edges that has comparable connectivity properties. This result has non-trivial implications for problems in data science where an affinity matrix is constructed: instead of connecting each point to its k nearest neighbors, one can often pick

$k'\ll k$

random points out of the k nearest neighbors and only connect to those without sacrificing quality of results. This approach can simplify and accelerate computation; we illustrate this with experimental results in spectral clustering of large-scale datasets.

Publisher

Cambridge University Press (CUP)

Subject

Statistics, Probability and Uncertainty,General Mathematics,Statistics and Probability

Reference46 articles.

1. [46] Yukich, J. (1998). Probability Theory of Classical Euclidean Optimization Problems (Lecture Notes Math. 1675). Springer, Berlin.

2. A tutorial on spectral clustering

3. Visualizing data using t-SNE;van der Maaten;J. Machine Learning Res.,2008

4. Accelerating t-SNE using tree-based algorithms;van der Maaten;J. Machine Learning Res.,2014

5. k-Nearest-Neighbor Clustering and Percolation Theory

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Scalability and robustness of spectral embedding: landmark diffusion is all you need;Information and Inference: A Journal of the IMA;2022-08-02

2. Scalable Algorithms for Convex Clustering;2021 IEEE Data Science and Learning Workshop (DSLW);2021-06-05

3. Detection of differentially abundant cell subpopulations in scRNA-seq data;Proceedings of the National Academy of Sciences;2021-05-17