Abstract
AbstractSummaryMotivated by theoretical and practical issues that arise when applying Principal Components Analysis (PCA) to count data, Townes et al introduced “Poisson GLM-PCA”, a variation of PCA adapted to count data, as a tool for dimensionality reduction of single-cell RNA sequencing (RNA-seq) data. However, fitting GLM-PCA is computationally challenging. Here we study this problem, and show that a simple algorithm, which we call “Alternating Poisson Regression” (APR), produces better quality fits, and in less time, than existing algorithms. APR is also memory-efficient, and lends itself to parallel implementation on multi-core processors, both of which are helpful for handling large single-cell RNA-seq data sets. We illustrate the benefits of this approach in two published single-cell RNA-seq data sets. The new algorithms are implemented in an R package, fastglmpca.Availability and implementationThe fastglmpca R package is released on CRAN for Windows, macOS and Linux, and the source code is available at github.com/stephenslab/fastglmpca under the open source GPL-3 license. Scripts to reproduce the results in this paper are also available in the GitHub repository.Contactmstephens@uchicago.eduSupplementary informationSupplementary data are available onBioRxivonline.
Publisher
Cold Spring Harbor Laboratory
Reference21 articles.
1. NewWave: a scalable R/Bioconductor package for the dimensionality reduction and batch effect removal of single-cell RNA-seq data;Bioinformatics,2022
2. Orchestrating single-cell analysis with Bioconductor;Nature Methods,2020
3. M. Chen , W. Li , W. Zhang , and X. Wang . Dimensionality reduction with generalized linear models. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence, pages 1267–1272, 2013.
4. Generalized bilinear models
5. M. Collins , S. Dasgupta , and R. E. Schapire . A generalization of principal components analysis to the exponential family. Advances in Neural Information Processing Systems, 14, 2001.