Affiliation:
1. IBM Research Europe, Zurich and ETH Zurich, Zurich Switzerland
2. University of Patras, Patras, Greece
Abstract
We present parallel algorithms and data structures for three fundamental operations in Numerical Linear Algebra: (i) Gaussian and CountSketch random projections and their combination, (ii) computation of the Gram matrix, and (iii) computation of the squared row norms of the product of two matrices, with a special focus on “tall-and-skinny” matrices, which arise in many applications. We provide a detailed analysis of the ubiquitous CountSketch transform and its combination with Gaussian random projections, accounting for memory requirements, computational complexity and workload balancing. We also demonstrate how these results can be applied to column subset selection, least squares regression and leverage scores computation. These tools have been implemented in
pylspack
, a publicly available Python package
1
whose core is written in C++ and parallelized with OpenMP and that is compatible with standard matrix data structures of SciPy and NumPy. Extensive numerical experiments indicate that the proposed algorithms scale well and significantly outperform existing libraries for tall-and-skinny matrices.
Publisher
Association for Computing Machinery (ACM)
Subject
Applied Mathematics,Software
Reference88 articles.
1. Database-friendly random projections
2. Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform
3. Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes
4. Ahmed Alaoui and Michael W. Mahoney. 2015. Fast randomized kernel ridge regression with statistical guarantees. In Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (Eds.), Vol. 28. Curran Associates, Inc.
5. A Refined Laser Method and Faster Matrix Multiplication