Abstract
ABSTRACTTranscription factor (TF) binding to DNA is critical to transcription regulation. Although the binding properties of numerous individual TFs are well-documented, a more detailed comprehension of how TFs interact cooperatively with DNA, forming either complex or co-binding to the same region, is required. Indeed, the combinatorial binding of TFs is essential to cell differentiation, development, and response to external stimuli. We present COBIND, a novel method based on non-negative matrix factorization (NMF) to identify TF co-binding patterns automatically. COBIND applies NMF to one-hot encoded regions flanking known TF binding sites (TFBSs) to pinpoint enriched DNA patterns at fixed distances. We applied COBIND to 8,293 TFBS datasets from UniBind for 404 TFs in seven species. The method uncovered already established co-binding patterns (e.g.,between POU5F1 and SOX2 or SOX17) and new co-binding configurations not yet reported in the literature and inferred through motif similarity and protein-protein interaction knowledge. Our extensive analyses across species revealed that 84% of the studied TFs share a co-binding motif with other TFs from the same structural family. The co-binding patterns captured by COBIND are likely functionally relevant as they harbor higher evolutionarily conservation than isolated TFBSs. Open chromatin data from matching human cell lines further supported the co-binding predictions. Finally, we used single-molecule footprinting data from mouse embryonic stem cells to confirm that the co-binding events captured by COBIND were likely occurring on the same DNA molecules.
Publisher
Cold Spring Harbor Laboratory