Design of synthetic promoters for cyanobacteria with generative deep-learning model

Author:

Seo Euijin1,Choi Yun-Nam1,Shin Ye Rim1,Kim Donghyuk23,Lee Jeong Wook145ORCID

Affiliation:

1. Department of Chemical Engineering, Pohang University of Science and Technology (POSTECH) , 77 Cheongam-Ro, Nam-Gu , Pohang , Gyeongbuk 37673 , Korea

2. School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST) , 50 UNIST-Gil , Eonyang-Eup , Ulsan 44919 , Korea

3. Department of Energy Engineering, Ulsan National Institute of Science and Technology (UNIST) , 50 UNIST-Gil , Eonyang-Eup , Ulsan 44919 , Korea

4. School of Interdisciplinary Bioscience and Bioengineering, Pohang University of Science and Technology (POSTECH) , 77 Cheongam-Ro, Nam-Gu , Pohang , Gyeongbuk 37673,  Korea

5. Graduate School of Artificial Intelligence, Pohang University of Science and Technology (POSTECH) , 77 Cheongam-Ro, Nam-Gu , Pohang , Gyeongbuk 37673 , Korea

Abstract

Abstract Deep generative models, which can approximate complex data distribution from large datasets, are widely used in biological dataset analysis. In particular, they can identify and unravel hidden traits encoded within a complicated nucleotide sequence, allowing us to design genetic parts with accuracy. Here, we provide a deep-learning based generic framework to design and evaluate synthetic promoters for cyanobacteria using generative models, which was in turn validated with cell-free transcription assay. We developed a deep generative model and a predictive model using a variational autoencoder and convolutional neural network, respectively. Using native promoter sequences of the model unicellular cyanobacterium Synechocystis sp. PCC 6803 as a training dataset, we generated 10 000 synthetic promoter sequences and predicted their strengths. By position weight matrix and k-mer analyses, we confirmed that our model captured a valid feature of cyanobacteria promoters from the dataset. Furthermore, critical subregion identification analysis consistently revealed the importance of the -10 box sequence motif in cyanobacteria promoters. Moreover, we validated that the generated promoter sequence can efficiently drive transcription via cell-free transcription assay. This approach, combining in silico and in vitro studies, will provide a foundation for the rapid design and validation of synthetic promoters, especially for non-model organisms.

Funder

Bio & Medical Technology Development Program of the National Research Foundation

Ministry of Science & ICT

C1 Gas Refinery Program

MSIT

Publisher

Oxford University Press (OUP)

Subject

Genetics

Cited by 16 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3