Abstract
AbstractThe evolution of gene expression responses are a critical component of adaptation to variable environments. Predicting how DNA sequence influences expression is challenging because the genotype to phenotype map is not well resolved forcisregulatory elements, transcription factor binding, regulatory interactions, and epigenetic features, not to mention how these factors respond to environment. We tested if flexible machine learning models could learn some of the underlyingcis-regulatory genotype to phenotype map. We tested this approach using cold-responsive transcriptome profiles in 5 diverseArabidopsis thalianaaccessions. We first tested for evidence thatcisregulation plays a role in environmental response, finding 14 and 15 motifs that were significantly enriched within the up- and down-stream regions of cold-responsive differentially regulated genes (DEGs). We next applied convolutional neural networks (CNNs), which learnde novo cis-regulatory motifs in DNA sequences to predict expression response to environment. We found that CNNs predicted differential expression with moderate accuracy, with evidence that predictions were hindered by biological complexity of regulation and the large potential regulatory code. Overall, DEGs between specific environments can be predicted based on variation incis-regulatory sequences, although more information needs to be incorporated and better models may be required.
Publisher
Cold Spring Harbor Laboratory
Reference77 articles.
1. 1001 Genomes Consortium. 2016. 1,135 Genomes Reveal the Global Pattern of Polymorphism in Arabidopsis thaliana. Cell
2. Abadi M , Agarwal A , Barham P , Brevdo E , Chen Z , Citro C , Corrado GS , Davis A , Dean J , Devin M , et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems.
3. cis-Regulatory Elements and Chromatin State Coordinately Control Temporal and Spatial Expression ofFLOWERING LOCUS TinArabidopsis
4. Akagi T , Masuda K , Kuwada E , Takeshita K , Kawakatsu T , Ariizumi T , Kubo Y , Ushijima K , Uchida S . 2022. Genome-wide cis-decoding for expression design in tomato using cistrome data and explainable deep learning. Plant Cell:koac 079.
5. Anon. JASPAR - A database of transcription factor binding profiles. Available from: https://jaspar.genereg.net/