Abstract
AbstractOur understanding of the precise locations ofcis-regulatory elements (CRMs) in the genomes, as well as their functional types (enhancer or silencer), states (active or inactive) and target genes in various cell/tissue types of organisms remains limited, despite recent progresses. To address these challenges, we have recently developed a two-step strategy that first predicts a more complete map of CRMs in the genome, and then predicts the functional states of the CRMs. However, our initial approach lacked the ability to differentiate between the functional types of CRMs. Therefore, we utilized distinct features to simultaneously predict the functional types and states of the CRMs. Applying our method to 107 cell/tissue types with the minimum of required data available, we predicted 868,948 (73.8%) of the CRMs to be active as enhancers or silencers in at least one of these cell/tissue types. In 56 cell/tissue types with required data available for both enhancers and silencers, we predicted that 117,646 (14.8%) and 227,211 (28.6%) CRMs only functioned as enhancers (enhancer-predominant) and silencers (silencer-predominant), respectively, while 83,985 (10.6%) functioned both as enhancers and silencers (dual functional). Thus, both dual functional CRMs and silencers might be more prevalent than previously assumed. Most dual functional CRMs function either as enhancers or silencers in different cell/tissue types (Type I), while some have dual functions regulating different genes in the same cell/tissue types (Type II). Different types of CRMs display different lengths and TFBS densities, reflecting the complexity of their functions. Our two-step approach can accurately predict the functional types and states of CRMs using data of only five epigenetic marks in a cell/tissue type.Author SummaryCRMs function as enhancers and/or silencers to promote and repress, respectively, the transcription of genes in a spatiotemporal manner, thereby playing critical roles in virtually all biological processes. However, despite recent progress, the understanding of CRMs remains limited. Most existing methods are aimed to simultaneously predict the locations and functional states of enhancers in a given cell/tissue type, however, the accuracy of these one-step methods is low. We have recently developed a two-step strategy that first predicts locations of CRMs in the genome, and then predicts their functional states as enhancers in cell/tissue types with high accuracy. However, our initial approach was unable to differentiate between enhancers and silencers. Therefore, in this study, we employ two machine-learning models, so that we can simultaneously predict the functional states and types of our previously predicted 1.2M CRMs in various cell/tissue types. Applying the method to cell/tissue types with the data available, we categorize the CRMs into four types with distinct properties reflecting their functional complexity. Our results indicate that silencers and dual functional CRMs might be more prevalent than previously assumed. The precise prediction of CRM types and states provides opportunities to pinpoint their target genes, thus opening new avenues for research.
Publisher
Cold Spring Harbor Laboratory