Abstract
Background
Gene-gene co-expression correlations measured by mRNA-sequencing (RNA-seq) can be used to predict gene annotations based on the co-variance structure within these data. In our prior work, we showed that uniformly aligned RNA-seq co-expression data from thousands of diverse studies is highly predictive of both gene annotations and protein-protein interactions. However, the performance of the predictions varies depending on whether the gene annotations and interactions are cell type and tissue specific or agnostic. Tissue and cell type-specific gene-gene co-expression data can be useful for making more accurate predictions because many genes perform their functions in unique ways in different cellular contexts. However, identifying the optimal tissues and cell types to partition the global gene-gene co-expression matrix is challenging.
Results
Here we introduce and validate an approach called PRediction of gene Insights from Stratified Mammalian gene co-EXPression (PrismEXP) for improved gene annotation predictions based on RNA-seq gene-gene co-expression data. Using uniformly aligned data from ARCHS4, we apply PrismEXP to predict a wide variety of gene annotations including pathway membership, Gene Ontology terms, as well as human and mouse phenotypes. Predictions made with PrismEXP outperform predictions made with the global cross-tissue co-expression correlation matrix approach on all tested domains, and training using one annotation domain can be used to predict annotations in other domains.
Conclusions
By demonstrating the utility of PrismEXP predictions in multiple use cases we show how PrismEXP can be used to enhance unsupervised machine learning methods to better understand the roles of understudied genes and proteins. To make PrismEXP accessible, it is provided via a user-friendly web interface, a Python package, and an Appyter. AVAILABILITY. The PrismEXP web-based application, with pre-computed PrismEXP predictions, is available from: https://maayanlab.cloud/prismexp; PrismEXP is also available as an Appyter: https://appyters.maayanlab.cloud/PrismEXP/; and as Python package: https://github.com/maayanlab/prismexp.
Funder
National Institutes of Health
Subject
General Agricultural and Biological Sciences,General Biochemistry, Genetics and Molecular Biology,General Medicine,General Neuroscience
Reference45 articles.
1. Phenotypic spectrum of ALPK3-related cardiomyopathy;Al Senaidi;American Journal of Medical Genetics Part A,2019
2. Prediction of human disease genes by human-mouse conserved coexpression analysis;Ala;PLOS Computational Biology,2008
3. Approaches for extracting practical information from gene co-expression networks in plant biology;Aoki;Plant and Cell Physiology,2007
4. Gene Ontology: tool for the unification of biology;Ashburner;Nature Genetics,2000
5. Linear discriminant analysis-a brief tutorial;Balakrishnama;Institute for Signal and information Processing,1998
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献