Fast computation of principal components of genomic similarity matrices-Reference-Cited by-同舟云学术

Fast computation of principal components of genomic similarity matrices

Published:2022-10-08 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Hahn Georg^ORCID,Lutz Sharon M.,Hecker Julian,Prokopenko Dmitry,Cho Michael H.^ORCID,Silverman Edwin K.,Weiss Scott T.,Lange Christoph

Abstract

AbstractThe computation of a similarity measure for genomic data, for instance using the (genomic) covariance matrix, the Jaccard matrix, or the genomic relationship matrix (GRM), is a standard tool in computational genetics. The principal components of such matrices are routinely used to correct for biases in, for instance, linear regressions. However, the calculation of both a similarity matrix and its singular value decomposition (SVD) are computationally intensive. The contribution of this article is threefold. First, we demonstrate that the calculation of three matrices (the genomic covariance matrix, the weighted Jaccard matrix, and the genomic relationship matrix) can be reformulated in a unified way which allows for an exact, faster SVD computation. An exception is the Jaccard matrix, which does not have a structure applicable for the fast SVD computation. An exact algorithm is proposed to compute the principal components of the genomic covariance, weighted Jaccard, and genomic relationship matrices. The algorithm is adapted from an existing randomized SVD algorithm and ensures that all computations are carried out in sparse matrix algebra. Second, an approximate Jaccard matrix is introduced to which the fast SVD computation is applicable. Third, we establish guaranteed theoretical bounds on the distance (in L2 norm and angle) between the principal components of the Jaccard matrix and the ones of our proposed approximation, thus putting the proposed Jaccard approximation on a solid mathematical foundation. We illustrate all computations on both simulated data and data of the 1000 Genome Project, showing that the approximation error is very low in practice.

Publisher

Cold Spring Harbor Laboratory

Reference26 articles.

1. Fast Principal Component Analysis of Large-Scale Genome-Wide Data

2. Demonstrating stratification in a European American population

3. The Rotation of Eigenvectors by a Perturbation. III

4. A Simple and Improved Correction for Population Stratification in Case-Control Studies

5. Uber die Abgrenzung der Eigenwerte einer Matrix;Izv. Akad. Nauk. USSR Otd. Fiz.-Mat. Nauk,1931