Accurate and efficient protein sequence design through learning concise local environment of residues

Author:

Huang Bin12ORCID,Fan Tingwen3,Wang Kaiyue45,Zhang Haicang126,Yu Chungong126,Nie Shuyu37,Qi Yangshuo37,Zheng Wei-Mou28,Han Jian3,Fan Zheng9ORCID,Sun Shiwei126ORCID,Ye Sheng45,Yang Huaiyi23,Bu Dongbo126ORCID

Affiliation:

1. Key Lab of Intelligent Information Processing, SKLP, Institute of Computing Technology, Chinese Academy of Sciences , Beijing 100190, China

2. University of Chinese Academy of Sciences , Beijing 100110, China

3. Key Lab of Microbial Physiological & Metabolic Engineering, State Key Lab of Mycology, Institute of Microbiology, Chinese Academy of Sciences , Beijing 100101, China

4. Beijing Advanced Innovation Center for Big Data-based Precision Medicine, School of Engineering Medicine, Beihang University , Beijing 100083, China

5. Key Laboratory of Big Data-based Precision Medicine (Beihang University), Ministry of Industry and Information Technology of the People’s Republic of China , Beijing 100083, China

6. Zhongke Big Data Academy , Zhengzhou, Henan 450046, China

7. School of Life Sciences, Hebei University , Baoding, Hebei 071002, China

8. Institute of Theoretical Physics, Chinese Academy of Sciences , Beijing 100190, China

9. Institutional Center for Shared Technologies and Facilities, Institute of Microbiology, Chinese Academy of Sciences , Beijing 100101, China

Abstract

AbstractMotivationComputational protein sequence design has been widely applied in rational protein engineering and increasing the design accuracy and efficiency is highly desired.ResultsHere, we present ProDESIGN-LE, an accurate and efficient approach to protein sequence design. ProDESIGN-LE adopts a concise but informative representation of the residue’s local environment and trains a transformer to learn the correlation between local environment of residues and their amino acid types. For a target backbone structure, ProDESIGN-LE uses the transformer to assign an appropriate residue type for each position based on its local environment within this structure, eventually acquiring a designed sequence with all residues fitting well with their local environments. We applied ProDESIGN-LE to design sequences for 68 naturally occurring and 129 hallucinated proteins within 20 s per protein on average. The designed proteins have their predicted structures perfectly resembling the target structures with a state-of-the-art average TM-score exceeding 0.80. We further experimentally validated ProDESIGN-LE by designing five sequences for an enzyme, chloramphenicol O-acetyltransferase type III (CAT III), and recombinantly expressing the proteins in Escherichia coli. Of these proteins, three exhibited excellent solubility, and one yielded monomeric species with circular dichroism spectra consistent with the natural CAT III protein.Availability and implementationThe source code of ProDESIGN-LE is available at https://github.com/bigict/ProDESIGN-LE.

Funder

National Key Research and Development Program of China

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Cited by 12 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3