Abstract
AbstractMotivationRNA-Seq data analysis is commonly biased towards detecting differentially expressed genes and insufficiently conveys the complexity of gene expression changes between biological conditions. This bias arises because discrete models of RNA-Seq count data cannot fully characterize the mean, variance, and skewness of gene expression distribution using independent model parameters. A unified framework that simultaneously tests for differential expression, variability, and skewness is needed to realize the full potential of RNA-Seq data analysis in a systems biology context.ResultsWe present SIEVE, a statistical methodology that provides the desired unified framework. SIEVE embraces a compositional data analysis framework that transforms discrete RNA-Seq counts to a continuous form with a distribution that is well-fitted by a skew-normal distribution. Simulation results show that SIEVE controls the false discovery rate and probability of Type II error better than existing methods for differential expression analysis. Analysis of the Mayo RNA-Seq dataset for Alzheimer’s disease using SIEVE reveals that a gene set with significant expression difference in mean, standard deviation and skewness between the control and the Alzheimer’s disease group strongly predicts a subject’s disease state. Furthermore, functional enrichment analysis shows that relying solely on differentially expressed genes detects only a segment of a much broader spectrum of biological aspects associated with Alzheimer’s disease. The latter aspects can only be revealed using genes that show differential variability and skewness. Thus, SIEVE enables fresh perspectives for understanding the intricate changes in gene expression that occur in complex diseasesAvailabilityThe SIEVE R package and source codes are available athttps://github.com/Divo-Lee/SIEVE.
Publisher
Cold Spring Harbor Laboratory
Reference77 articles.
1. Aitchison J (1986) The Statistical Analysis of Compositional Data. Chapman & Hall, London
2. ER stress and UPR in Alzheimer’s disease: Mechanisms, pathogenesis, treatments;Cell Death & Disease,2022
3. Inflammation and Alzheimer's disease
4. Human whole genome genotype and transcriptome data for Alzheimer’s and other neurodegenerative diseases;Scientific Data,2016
5. Azzalini A (2022) The R package sn: The skew-normal and related distributions such as the skew-t and the SUN (version 2.1.0). Universitá degli Studi di Padova, Italia, URL https://cran.r-project.org/package=sn, home page: http://azzalini.stat.unipd.it/SN/