Abstract
AbstractExtraction of meaningful biological insight from gene expression profiling often focuses on the identification of statistically enriched terms or pathways. These methods typically use gene sets as input data, and subsequently return overrepresented terms along with associated statistics describing their enrichment. This approach does not cater to analyses focused on a single gene-of-interest, particularly when the gene lacks prior functional characterization. To address this, we formulatedGeneCOCOA, a method which utilizes context-specific gene co-expression and curated functional gene sets, but focuses on a user-supplied gene-of-interest. The co-expression between the gene-of-interest and subsets of genes from functional groups (e.g. pathways, GO terms) is derived using linear regression, and resulting root-mean-square error values are compared against background values obtained from randomly selected genes. The resultingpvalues provide a statistical ranking of functional gene sets from any collection, along with their associated terms, based on their co-expression with the gene of interest in a manner specific to the context and experiment.GeneCOCOAthereby provides biological insight into both gene function, and putative regulatory mechanisms by which the expression of the gene-of-interest is controlled. Despite its relative simplicity,GeneCOCOAoutperforms similar methods in the accurate recall of known gene-disease associations.GeneCOCOAis formulated as an R package for ease-of-use, available athttps://github.com/si-ze/geneCOCOA.Author summaryUnderstanding the biological functions of different genes and their respective products is a key element of modern biological research. While one can examine the relative abundance of a gene product in transcriptomics data, this alone does not provide any clue to the biological relevance of the gene. Using a type of analysis called co-expression, it is possible to identify other genes which have similar patterns of regulation to a gene-of-interest, but again, this cannot tell you what a gene does. Genes whose function has previously been studied are often assembled into groups (e.g. pathways, ontologies), which can be used to annotate gene sets of interest. However, if a gene has not yet been characterized, it will not appear in these gene set enrichment analyses. Here, we propose a new method -GeneCOCOA- which uses co-expression of a single gene with genes in functional groups to identify which functional group a gene is most similar too, resulting in a putative function for the gene, even if it has not been studied before. We testedGeneCOCOAby using it to find gene-disease links which have already been scientifically studied, and showed thatGeneCOCOAcan do this more effectively than other available methods.
Publisher
Cold Spring Harbor Laboratory