Systematic analysis of the genomic features involved in the binding preferences of transcription factors-Reference-Cited by-同舟云学术

Systematic analysis of the genomic features involved in the binding preferences of transcription factors

Published:2022-08-16 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Romero Raphaël,Menichelli Christophe,Marin Jean-Michel,Lèbre Sophie,Lecellier Charles-Henri,Bréhélin Laurent

Abstract

AbstractTranscription factors (TFs) orchestrate gene expression and are at the core of cell-specific phenotypes and functions. One given TF can therefore have different binding sites depending on cell type and conditions. However, the TF core motif, as represented by Position Weight Matrix for instance, are often, if not invariably, cell agnostic. Likewise, paralogous TFs recognize very similar motifs while binding different genomic regions. We propose a machine learning approach called TFscope aimed at identifying the DNA features explaining the binding differences observed between two ChIP-seq experiments targeting either the same TF in two cell types or treatments or two paralogous TFs. TFscope systematically investigates differences in i) core motif, ii) nucleotide environment around the binding site and iii) presence and location of co-factor motifs. It provides the main DNA features that have been detected, and the contribution of each of these features to explain the binding differences. TFscope has been applied to more than 350 pairs of ChIP-seq. Our experiments showed that the approach is accurate and that the genomic features distinguishing TF binding in two different settings vary according to the TFs considered and/or the conditions. Several samples are presented and discussed to illustrate these findings. For TFs in different cell types or with different treatments, co-factors and nucleotide environment often explain most of the binding-site differences, while for paralogous TFs, subtle differences in the core motif seem to be the main reason for the observed differences in our experiments.The source code (python), data and results of the experiments described in this article are available athttps://gite.lirmm.fr/rromero/tfscope.

Publisher

Cold Spring Harbor Laboratory

Reference54 articles.

1. Non-consensus Protein Binding to Repetitive DNA Sequence Elements Significantly Affects Eukaryotic Genomes;PLOS Computational Biology,2015

2. Predicting mRNA Abundance Directly from Genomic Sequence Using Deep Convolutional Neural Networks;Cell Reports,2020

3. Transcriptional enhancers: Intelligent enhanceosomes or flexible billboards?

4. Base-resolution models of transcription-factor binding reveal soft motif syntax;Subject term;Chromatin immunoprecipitation;Computational biology and bioinformatics;Genomics Subject term id: chromatin-immunoprecipitation;computational-biology-and-bioinformatics;genomics,2021

5. Timothy L Bailey . STREME: accurate and versatile sequence motif discovery. Bioinformatics, (btab203), March 2021.