Modeling 0.6 million genes for the rational design of functional cis -regulatory variants and de novo design of cis- regulatory sequences

Author:

Li Tianyi1ORCID,Xu Hui1,Teng Shouzhen1,Suo Mingrui1,Bahitwa Revocatus12ORCID,Xu Mingchi1,Qian Yiheng1,Ramstein Guillaume P.3ORCID,Song Baoxing45,Buckler Edward S.67ORCID,Wang Hai189ORCID

Affiliation:

1. State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, People’s Republic of China

2. Legumes Research Program, Research and Innovation Division, Tanzania Agricultural Research Institute, Ilonga, Kilosa, Morogoro 67410, Tanzania

3. Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus 8000, Denmark

4. National Key Laboratory of Wheat Improvement, Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agriculture Sciences in Weifang, Weifang, Shandong 261325, People’s Republic of China

5. Key Laboratory of Maize Biology and Genetic Breeding in Arid Area of Northwest Region of the Ministry of Agriculture, College of Agronomy, Northwest A&F University, Yangling, Shaanxi 712100, People’s Republic of China

6. Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853

7. Agricultural Research Service, United States Department of Agriculture, Ithaca, NY 14853

8. Center for Crop Functional Genomics and Molecular Breeding, China Agricultural University, Beijing 100193, People’s Republic of China

9. Sanya Institute of China Agricultural University, Sanya 572025, People’s Republic of China

Abstract

Rational design of plant cis -regulatory DNA sequences without expert intervention or prior domain knowledge is still a daunting task. Here, we developed PhytoExpr, a deep learning framework capable of predicting both mRNA abundance and plant species using the proximal regulatory sequence as the sole input. PhytoExpr was trained over 17 species representative of major clades of the plant kingdom to enhance its generalizability. Via input perturbation, quantitative functional annotation of the input sequence was achieved at single-nucleotide resolution, revealing an abundance of predicted high-impact nucleotides in conserved noncoding sequences and transcription factor binding sites. Evaluation of maize HapMap3 single-nucleotide polymorphisms (SNPs) by PhytoExpr demonstrates an enrichment of predicted high-impact SNPs in cis -eQTL. Additionally, we provided two algorithms that harnessed the power of PhytoExpr in designing functional cis -regulatory variants, and de novo creation of species-specific cis -regulatory sequences through in silico evolution of random DNA sequences. Our model represents a general and robust approach for functional variant discovery in population genetics and rational design of regulatory sequences for genome editing and synthetic biology.

Funder

National Key Research and Development Program of China

National Natural Science Foundation of China

Publisher

Proceedings of the National Academy of Sciences

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3