Author:
Hodge Kenneth,Saethang Thammakorn
Abstract
WIMG AbstractSince its inception over 20 years ago, gene enrichment has been largely associated with curated gene lists (e.g. GO) that are constructed to represent various biological concepts; the cell cycle, cancer drivers, protein-protein interactions, etc. Researchers expect that a comparison of their own lab-generated lists with curated lists should produce insight. Despite the abundance of such curated lists, we here show that they rarely outperform existing individual lab-generated datasets when measured using standard statistical tests of study/study overlap. This demonstration is enabled by the WhatIsMyGene database, which we believe to be the single largest compendium of transcriptomic and micro-RNA perturbation data. The database also houses voluminous proteomic, cell type clustering, lncRNA, epitranscriptomic (etc.) data. In the case of enrichment tools that do incorporate specific lab studies in underlying databases, WIMG generally outperforms in the simple task of reflecting back to the user known aspects of the input set (cell type, the type of perturbation, species, etc.), enhancing confidence that unknown aspects of the input may also be revealed in the output. A limited number of GO lists are included in the database. However, these lists are assigned backgrounds, meaning that GO lists that are replete with abundant entities do not inordinately percolate to the highest ranking positions in output. We delineate a number of other features that should make WIMG indispensable in answering essential questions such as “What processes are embodied in my gene list?” and “What does my gene do?”
Publisher
Cold Spring Harbor Laboratory