Low-Rank Approximation and Regression in Input Sparsity Time

Author:

Clarkson Kenneth L.1,Woodruff David P.1

Affiliation:

1. IBM Research, Almaden, Harry Road, San Jose, CA

Abstract

We design a new distribution over m × n matrices S so that, for any fixed n × d matrix A of rank r , with probability at least 9/10, ∥ SAx2 = (1 ± ε)∥ Ax2 simultaneously for all x ∈ R d . Here, m is bounded by a polynomial in r ε − 1 , and the parameter ε ∈ (0, 1]. Such a matrix S is called a subspace embedding . Furthermore, SA can be computed in O (nnz( A )) time, where nnz( A ) is the number of nonzero entries of A . This improves over all previous subspace embeddings, for which computing SA required at least Ω( nd log d ) time. We call these S sparse embedding matrices . Using our sparse embedding matrices, we obtain the fastest known algorithms for overconstrained least-squares regression, low-rank approximation, approximating all leverage scores, and ℓ p regression. More specifically, let b be an n × 1 vector, ε > 0 a small enough value, and integers k , p ⩾ 1. Our results include the following. Regression: The regression problem is to find d × 1 vector x ′ for which ∥ Ax ′ − b p ⩽ (1 + ε)min x Axb p . For the Euclidean case p = 2, we obtain an algorithm running in O (nnz( A )) + Õ ( d 3 ε −2 ) time, and another in O (nnz( A )log(1/ε)) + Õ ( d 3 log (1/ε)) time. (Here, Õ ( f ) = f ċ log O (1) ( f ).) For p ∈ [1, ∞), more generally, we obtain an algorithm running in O (nnz( A ) log n ) + O ( r−1 ) C time, for a fixed C . Low-rank approximation: We give an algorithm to obtain a rank- k matrix  k such that ∥ A  k F ≤ (1 + ε )∥ A A k F , where A k is the best rank- k approximation to A . (That is, A k is the output of principal components analysis, produced by a truncated singular value decomposition, useful for latent semantic indexing and many other statistical problems.) Our algorithm runs in O (nnz( A )) + Õ ( nk 2 ε −4 + k 3 ε −5 ) time. Leverage scores: We give an algorithm to estimate the leverage scores of A , up to a constant factor, in O (nnz( A )log n ) + Õ ( r 3 )time.

Funder

Defense Advanced Research Projects Agency

Air Force Research Laboratory

XDATA

Publisher

Association for Computing Machinery (ACM)

Subject

Artificial Intelligence,Hardware and Architecture,Information Systems,Control and Systems Engineering,Software

Reference65 articles.

1. Dimitris Achlioptas Amos Fiat Anna R. Karlin and Frank McSherry. 2001. Web search via hub synthesis. In FOCS. 500--509. 10.1109/SFCS.2001.959926 Dimitris Achlioptas Amos Fiat Anna R. Karlin and Frank McSherry. 2001. Web search via hub synthesis. In FOCS. 500--509. 10.1109/SFCS.2001.959926

2. Dimitris Achlioptas and Frank McSherry. 2005. On spectral learning of mixtures of distributions. In COLT. 458--469. 10.1007/11503415_31 Dimitris Achlioptas and Frank McSherry. 2005. On spectral learning of mixtures of distributions. In COLT. 458--469. 10.1007/11503415_31

3. Fast computation of low-rank matrix approximations

4. Nir Ailon and Bernard Chazelle. 2006. Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform. In STOC. 557--563. 10.1145/1132516.1132597 Nir Ailon and Bernard Chazelle. 2006. Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform. In STOC. 557--563. 10.1145/1132516.1132597

5. Sanjeev Arora Elad Hazan and Satyen Kale. 2006. A fast random sampling algorithm for sparsifying matrices. In APPROX-RANDOM. 272--279. 10.1007/11830924_26 Sanjeev Arora Elad Hazan and Satyen Kale. 2006. A fast random sampling algorithm for sparsifying matrices. In APPROX-RANDOM. 272--279. 10.1007/11830924_26

Cited by 110 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Improving compressed matrix multiplication using control variate method;Information Processing Letters;2025-01

2. Accelerated Double-Sketching Subspace Newton;European Journal of Operational Research;2024-12

3. On the Consistency and Large-Scale Extension of Multiple Kernel Clustering;IEEE Transactions on Pattern Analysis and Machine Intelligence;2024-10

4. Recent and Upcoming Developments in Randomized Numerical Linear Algebra for Machine Learning;Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining;2024-08-24

5. Statistical inference for sketching algorithms;Information and Inference: A Journal of the IMA;2024-07-01

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3