Abstract
AbstractDifferential gene expression analysis of bulk RNA sequencing data plays a major role in the diagnosis, prognosis, and understanding of disease. Such analyses are often challenging due to a lack of good controls and the heterogeneous nature of the samples. Here, we present a deep generative model that can replace control samples. The model is trained on RNA-seq data from healthy tissues and learns a low-dimensional representation that clusters tissues very well without supervision. When applied to cancer samples, the model accurately identifies representations close to the tissue of origin. We interpret these inferred representations as the closest normal to the disease samples and use the resulting count distributions to perform differential expression analysis ofsinglecancer sampleswithoutcontrol samples. In a detailed analysis of breast cancer, we demonstrate how our approach finds subtype-specific cancer driver and marker genes with high specificity and greatly outperforms the state-of-the-art method in detecting differentially expressed genes, DESeq2. We further show that the significant genes found using the model are highly enriched within cancer-specific driver genes across different cancer types. Our results show that thein silicoclosest normal provides a more favorable comparison than control samples.
Publisher
Cold Spring Harbor Laboratory
Reference35 articles.
1. Machine Learning Based Refined Differential Gene Expression Analysis of Pediatric Sepsis;BMC Medical Genomics,2020
2. Aran, Dvir , Roman Camarda , Justin Odegaard , Hyojung Paik , Boris Oskotsky , Gregor Krings , Andrei Goga , Marina Sirota , and Atul J. Butte . 2017. “Comprehensive Analysis of Normal Adjacent to Tumor Transcriptomes.” Nature Communications. https://doi.org/10.1038/s41467-017-01027-z.
3. Gene expression analysis in RA: towards personalized medicine
4. Comprehensive genomic characterization defines human glioblastoma genes and core pathways
5. From Reads to Genes to Pathways: Differential Expression Analysis of RNA-Seq Experiments Using Rsubread and the edgeR Quasi-Likelihood Pipeline;F1000Research,2016