Author:
Lee Donghyung,Cheng Anthony,Bolisetty Mohan,Ucar Duygu
Abstract
AbstractSingle cell RNA-sequencing (scRNA-seq) precisely characterize gene expression levels and dissect variation in expression associated with the state (technical or biological) and the type of the cell, which is averaged out in bulk measurements. Multiple and correlated sources contribute to gene expression variation in single cells, which makes their estimation difficult with the existing methods developed for bulk measurements (e.g., surrogate variable analysis (SVA)) that estimate orthogonal transformations of these sources. We developed iteratively adjusted surrogate variable analysis (IA-SVA) that can estimate hidden and correlated sources of variation by identifying a set of genes affected with each hidden factor in an iterative manner. Analysis of scRNA-seq data from human cells showed that IA-SVA could accurately capture hidden variation arising from technical (e.g., stacked doublet cells) or biological sources (e.g., cell type or cell-cycle stage). Furthermore, IA-SVA delivers a set of genes associated with the detected hidden source to be used in downstream data analyses. As a proof of concept, IA-SVA recapitulated known marker genes for islet cell subsets (e.g., alpha, beta), which improved the grouping of subsets into distinct clusters. Taken together, IA-SVA is an effective and novel method to dissect multiple and correlated sources of variation in scRNA-seq data.
Publisher
Cold Spring Harbor Laboratory