Abstract
AbstractAcross tissues, gene expression is regulated by a combination of determinants, including the binding of transcription factors (TFs), along with other aspects of cellular state. Recent studies emphasize the importance of both genetic and epigenetic states – TF binding sites and binding site chromatin accessibility have emerged as potentially causal determinants of tissue specificity. To investigate the relative contributions of these determinants, we constructed three genome-scale datasets for both root and shoot tissues of the same Arabidopsis thaliana plants: TSS-seq data to identify Transcription Start Sites, OC-seq data to identify regions of Open Chromatin, and RNA-seq data to assess gene expression levels. For genes that are differentially expressed between root and shoot, we constructed a machine learning model predicting tissue of expression from chromatin accessibility and TF binding information upstream of TSS locations. The resulting model was highly accurate (over 90% auROC and auPRC), and our analysis of model contributions (feature weights) strongly suggests that patterns of TF binding sites within ∼500 nt TSS-proximal regions are predominant explainers of tissue of expression in most cases. Thus, in plants, cis-regulatory control of tissue-specific gene expression appears to be primarily determined by TSS-proximal sequences, and rarely by distal enhancer-like accessible chromatin regions. This study highlights the exciting future possibility of a native TF site-based design process for the tissue-specific targeting of plant gene promoters.
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Learning the Regulatory Code of Gene Expression;Frontiers in Molecular Biosciences;2021-06-10