Semantical and Topological Protein Encoding Toward Enhanced Bioactivity and Thermostability-Reference-Cited by-同舟云学术

Semantical and Topological Protein Encoding Toward Enhanced Bioactivity and Thermostability

Published:2023-12-02 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Tan Yang,Zhou Bingxin,Zheng Lirong,Fan Guisheng,Hong Liang

Abstract

AbstractProtein engineering is a pivotal aspect of synthetic biology, involving the modification of amino acids within existing protein sequences to achieve novel or enhanced functionalities and physical properties. Accurate prediction of protein variant effects requires a thorough understanding of protein sequence, structure, and function. Deep learning methods have demon-strated remarkable performance in guiding protein modification for improved functionality. However, existing approaches pre-dominantly rely on protein sequences, which face challenges in efficiently encoding the geometric aspects of amino acids’ local environment and often fall short in capturing crucial details related to protein folding stability, internal molecular interactions, and bio-functions. Furthermore, there lacks a fundamental evaluation for developed methods in predicting protein stability, although it is a key physical property that is frequently investigated in practice. To address these challenges, this paper introduces a novel pre-training framework that integrates sequential and geometric encoders for protein primary and tertiary structures. This framework guides mutation directions toward desired traits by simulating natural selection on wild-type proteins and evaluates variant effects based on their fitness to perform specific functions. We assess the proposed approach using four benchmarks comprising approximately 200 deep mutational scanning assays. These include two refined benchmarks for protein activities and two novel benchmarks for thermostability. The prediction results showcase exceptional performance across extensive experiments when compared to other zero-shot learning methods, all while maintaining a minimal cost in terms of trainable parameters. This study not only proposes an effective framework for more accurate and comprehensive predictions to facilitate efficient protein engineering, but also enhances thein silicoassessment system for future deep learning models to better align with empirical requirements.

Publisher

Cold Spring Harbor Laboratory

Reference82 articles.

1. Rational design of a conformation-specific antibody for the quantification of Aβ oligomers

2. Engineering the third wave of biocatalysis

3. ProteinBERT: A universal deep-learning model of protein sequence and function;Bioinformatics,2022

4. Proteinglue multi-task benchmark suite for self-supervised protein modeling;Scientific Reports,2022

5. Christian Dallago , Jody Mou , Kadina E Johnston , Bruce J Wittmann , Nicholas Bhattacharya , Samuel Goldman , Ali Madani , and Kevin K Yang . Flip: Benchmark tasks in fitness landscape inference for proteins. bioRxiv, pages 2021–11, 2021.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Simple, Efficient, and Scalable Structure-Aware Adapter Boosts Protein Language Models;Journal of Chemical Information and Modeling;2024-08-07

2. PETA: evaluating the impact of protein transfer learning with sub-word tokenization on downstream applications;Journal of Cheminformatics;2024-08-02

3. Protein Engineering with Lightweight Graph Denoising Neural Networks;Journal of Chemical Information and Modeling;2024-04-17

4. ProSST: Protein Language Modeling with Quantized Structure and Disentangled Attention;2024-04-17