Abstract
Motivation
Our study aimed to identify biologically relevant transcription factors (TFs) that control the expression of a set of co-expressed or co-regulated genes.
Results
We developed a fully automated pipeline, Motif Over Representation Analysis (MORA), to detect enrichment of known TF binding motifs in any query sequences. MORA performed better than or comparable to five other TF-prediction tools as evaluated using hundreds of differentially expressed gene sets and ChIP-seq datasets derived from known TFs. Additionally, we developed EnsembleTFpredictor to harness the power of multiple TF-prediction tools to provide a list of functional TFs ranked by prediction confidence. When applied to the test datasets, EnsembleTFpredictor not only identified the target TF but also revealed many TFs known to cooperate with the target TF in the corresponding biological systems. MORA and EnsembleTFpredictor have been used in two publications, demonstrating their power in guiding experimental design and in revealing novel biological insights.
Funder
National Institute of Neurological Disorders and Stroke
National Institute on Aging
National Human Genome Research Institute
Publisher
Public Library of Science (PLoS)
Reference60 articles.
1. Enhanced Identification of Transcriptional Enhancers Provides Mechanistic Insights into Diseases;Y Murakawa;Trends Genet,2016
2. Genomic Enhancers in Brain Health and Disease.;NVN Carullo;Genes (Basel).,2019
3. Combinatorial function of transcription factors and cofactors;F Reiter;Curr Opin Genet Dev,2017
4. DNA Motif Databases and Their Uses.;GD Stormo;Curr Protoc Bioinformatics,2015
5. High-throughput SELEX determination of DNA sequences bound by transcription factors in vitro;N Ogawa;Methods Mol Biol,2012