Abstract
AbstractThe availability of large scale epigenomic data from different cell types and conditions has provided valuable information to evaluate and learn features that predict co-binding of transcription factors (TF). However, previous attempts to develop models for predicting motif cooccurrence were not scalable for global analysis of any combination of motifs or cross-species predictions. Further, mapping co-regulatory modules (CRM) to their gene regulatory networks (GRN) is crucial in understanding the underlying function. Currently, there is no comprehensive pipeline to locate CRM and GRN on a large scale with speed and accuracy. In this study, we analyzed and evaluated different TF binding characteristics that would facilitate co-binding with biological significance to identify all possible clusters of co-binding TFs. We curated the UniBind database, which contains ChIP-Seq data from over 1983 samples and 232 TFs, and implemented two machine learning models to predict CRMs and potential regulatory networks they operate on. We narrowed our focus to study heart related regulatory motifs. Our findings highlight the importance of the NKX family of transcription factors in cardiac development and provide potential targets for further investigation in cardiac disease.
Publisher
Cold Spring Harbor Laboratory