Improving data interpretability with new differential sample variance gene set tests
Author:
Rahmatallah Yasir1, Glazko Galina1
Affiliation:
1. University of Arkansas for Medical Sciences
Abstract
Abstract
Background
Gene set analysis methods have played a major role in generating biological interpretations from omics data such as gene expression datasets. However, most methods focus on detecting homogenous pattern changes in mean expression and methods detecting pattern changes in variance remain poorly explored. While a few studies attempted to use gene-level variance analysis, such approach remains under-utilized. When comparing two phenotypes, gene sets with distinct changes in subgroups under one phenotype are overlooked by available methods although they reflect meaningful biological differences between two phenotypes. Multivariate sample-level variance analysis methods are needed to detect such pattern changes.
Results
We use ranking schemes based on minimum spanning tree to generalize the Cramer-Von Mises and Anderson-Darling univariate statistics into multivariate gene set analysis methods to detect differential sample variance or mean. We characterize these methods in addition to two methods developed earlier using simulation results with different parameters. We apply the developed methods to microarray gene expression dataset of prednisolone-resistant and prednisolone-sensitive children diagnosed with B-lineage acute lymphoblastic leukemia and bulk RNA-sequencing gene expression dataset of benign hyperplastic polyps and potentially malignant sessile serrated adenoma/polyps. One or both of the two compared phenotypes in each of these datasets have distinct molecular subtypes that contribute to heterogeneous differences. Our results show that methods designed to detect differential sample variance are able to detect specific hallmark signaling pathways associated with the two compared phenotypes as documented in available literature.
Conclusions
The results in this study demonstrate the usefulness of methods designed to detect differential sample variance in providing biological interpretations when biologically relevant but heterogeneous changes between two phenotypes are prevalent in specific signaling pathways. Software implementation of the developed methods is available with detailed documentation from Bioconductor package GSAR. The available methods are applicable to gene expression datasets in a normalized matrix form and could be used with other omics datasets in a normalized matrix form with available collection of feature sets.
Publisher
Springer Science and Business Media LLC
Reference132 articles.
1. 1. Fresard L, Smail C, Ferraro NM, Teran NA, Li X, Smith KS, Bonner D, Kernohan KD, Marwaha S, Zappala Z et al: Identification of rare-disease genes using blood transcriptome sequencing and large control cohorts. Nat Med 2019, 25(6):911–919. 2. 2. Kremer LS, Bader DM, Mertes C, Kopajtich R, Pichler G, Iuso A, Haack TB, Graf E, Schwarzmayr T, Terrile C et al: Genetic diagnosis of Mendelian disorders via RNA sequencing. Nat Commun 2017, 8:15824. 3. 3. Cummings BB, Marshall JL, Tukiainen T, Lek M, Donkervoort S, Foley AR, Bolduc V, Waddell LB, Sandaradura SA, O'Grady GL et al: Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Sci Transl Med 2017, 9(386). 4. 4. Cancer Genome Atlas Research Network, The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 2013, 45(10):1113–1120. 5. 5. Tang H, Wang S, Xiao G, Schiller J, Papadimitrakopoulou V, Minna J, Wistuba, II, Xie Y: Comprehensive evaluation of published gene expression prognostic signatures for biomarker-based lung cancer clinical studies. Ann Oncology 2017, 28(4):733–740.
|
|