Abstract
ABSTRACTIt has long been known that exons can encode transcriptional enhancers. However, the prevalence of such dual-use exons and related questions remain elusive. Our recently predicted highly accurate, large sets of cis-regulatory module candidates (CRMCs) and non-CRMCs in the human genome provide us an opportunity to address these questions. We find that exonic transcription factor binding sites(eTFBSs) occupy at least a third of the total exon lengths, suggesting exonic enhancers(eEHs) are more prevalent than originally thought. Moreover, active eTFBSs significantly overlap experimentally determined active eEHs, and enhance the transcription of nearby genes. Furthermore, both A/T and C/G in eTFBSs are more likely under evolutionary selection than those in non-CRMC exons, indicating the eTFBSs might be in dual-use. Interestingly, eTFBSs in codons tend to encode loops rather than more critical helices and strands in protein structures, while eTFBSs in untranslated regions (UTRs) tend to avoid positions where known UTR-related functions were located. Intriguingly, active eTFBSs are found to be in close physical proximity to distal promoters and involved in the activation of target genes. The close physical proximity between exons and promoters in topologically associating domains might render less critical exons to opt for parts of enhancers when non-exonic sequences are unavailable due to space constraints. It appears that nature avoids the dilemma of evolving a sequence for two unrelated functions by using less-critical, physically available exons for eTFBSs. Therefore, the prevalent dual-use of exons is not only possible but also inevitable.
Publisher
Cold Spring Harbor Laboratory