On randomized sketching algorithms and the Tracy–Widom law-Reference-Cited by-同舟云学术

On randomized sketching algorithms and the Tracy–Widom law

Published:2023-01-19 Issue:1 Volume:33 Page:
ISSN:0960-3174
Container-title:Statistics and Computing
language:en
Short-container-title:Stat Comput

Author:

Ahfock Daniel^ORCID,Astle William J.,Richardson Sylvia

Abstract

AbstractThere is an increasing body of work exploring the integration of random projection into algorithms for numerical linear algebra. The primary motivation is to reduce the overall computational cost of processing large datasets. A suitably chosen random projection can be used to embed the original dataset in a lower-dimensional space such that key properties of the original dataset are retained. These algorithms are often referred to as sketching algorithms, as the projected dataset can be used as a compressed representation of the full dataset. We show that random matrix theory, in particular the Tracy–Widom law, is useful for describing the operating characteristics of sketching algorithms in the tall-data regime when the sample size n is much greater than the number of variables d. Asymptotic large sample results are of particular interest as this is the regime where sketching is most useful for data compression. In particular, we develop asymptotic approximations for the success rate in generating random subspace embeddings and the convergence probability of iterative sketching algorithms. We test a number of sketching algorithms on real large high-dimensional datasets and find that the asymptotic expressions give accurate predictions of the empirical performance.

Funder

Alan Turing Institute

Medical Research Council

National Institute for Health Research

Publisher

Springer Science and Business Media LLC

Subject

Computational Theory and Mathematics,Statistics, Probability and Uncertainty,Statistics and Probability,Theoretical Computer Science

Link

https://link.springer.com/content/pdf/10.1007/s11222-022-10148-5.pdf

Reference50 articles.

1. Ahfock, D.C., Astle, W.J., Richardson, S.: Statistical properties of sketching algorithms. Biometrika 108(2), 283–297 (2020)

2. Ailon, N., Chazelle, B.: The fast Johnson Lindenstrauss transform and approximate nearest neighbors. SIAM J. Comput. 39(1), 302–322 (2009)

3. Bai, Z., Silverstein, J.W.: Spectral Analysis of Large Dimensional Random Matrices, 2nd edn. Springer, New York (2010)

4. Bao, Z., Pan, G., Zhou, W.: Universality for the largest eigenvalue of sample covariance matrices with general population. Ann. Stat. 43(1), 382–421 (2015)

5. Bardenet, R., Maillard, O.A.: A note on replacing uniform subsampling by random projections in MCMC for linear regression of tall datasets. HAL preprint 01248841 (2015)