Abstract
Background: Statistical deconvolution strategies have emerged over the past decade to estimate the proportion of various cell populations in homogenate tissue sources like brain using gene expression data. However, no study has been undertaken to assess the extent to which expression-based and DNAm-based cell type composition estimates agree. Results: Using estimated neuronal fractions from DNAm data, from the same brain region (i.e., matched) as our bulk RNA-Seq dataset, as proxies for the true unobserved cell-type fractions (i.e., as the gold standard), we assessed the accuracy (RMSE) and concordance (R2) of four reference-based deconvolution algorithms: Houseman, CIBERSORT, non-negative least squares (NNLS)/MIND, and MuSiC. We did this for two cell-type populations - neurons and non-neurons/glia - using matched single nuclei RNA-Seq and mismatched single cell RNA-Seq reference datasets. With the mismatched single cell RNA-Seq reference dataset, Houseman, MuSiC, and NNLS produced concordant (high correlation; Houseman R2 = 0.51, 95% CI [0.39, 0.65]; MuSiC R2 = 0.56, 95% CI [0.43, 0.69]; NNLS R2 = 0.54, 95% CI [0.32, 0.68]) but biased (high RMSE, >0.35) neuronal fraction estimates. CIBERSORT produced more discordant (moderate correlation; R2 = 0.25, 95% CI [0.15, 0.38]) neuronal fraction estimates, but with less bias (low RSME, 0.09). Using the matched single nuclei RNA-Seq reference dataset did not eliminate bias (MuSiC RMSE = 0.17). Conclusions: Our results together suggest that many existing RNA deconvolution algorithms estimate the RNA composition of homogenate tissue, e.g. the amount of RNA attributable to each cell type, and not the cellular composition, which relates to the underlying fraction of cells.
Funder
National Institute on Drug Abuse
National Institute of Mental Health
Subject
General Pharmacology, Toxicology and Pharmaceutics,General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,General Medicine