ProtWave-VAE: Integrating autoregressive sampling with latent-based inference for data-driven protein design-Reference-Cited by-同舟云学术

ProtWave-VAE: Integrating autoregressive sampling with latent-based inference for data-driven protein design

Published:2023-04-23 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Praljak Niksa,Lian Xinran,Ranganathan Rama,Ferguson Andrew L.^ORCID

Abstract

AbstractDeep generative models (DGMs) have shown great success in the understanding of data-driven design of proteins. Variational autoencoders (VAEs) are a popular DGM approach that can learn the correlated patterns of amino acid mutations within a multiple sequence alignment (MSA) of protein sequences and distill this information into a low-dimensional latent space to expose phylogenetic and functional relationships and guide generative protein design. Autoregressive (AR) models are another popular DGM approach that typically lack a low-dimensional latent embedding but do not require training sequences to be aligned into an MSA and enable the design of variable length proteins. In this work, we propose ProtWave-VAE as a novel and lightweight DGM employing an information maximizing VAE with a dilated convolution encoder and autoregressive WaveNet decoder. This architecture blends the strengths of the VAE and AR paradigms in enabling training over unaligned sequence data and the conditional generative design of variable length sequences from an interpretable low-dimensional learned latent space. We evaluate the model’s ability to infer patterns and design rules within alignment-free homologous protein family sequences and to design novel synthetic proteins in four diverse protein families. We show that our model can infer meaningful functional and phylogenetic embeddings within latent spaces and make highly accurate predictions within semi-supervised downstream fitness prediction tasks. In an application to the C-terminal SH3 domain in the Sho1 transmembrane osmosensing receptor in baker’s yeast, we subject ProtWave-VAE designed sequences to experimental gene synthesis and select-seq assays for osmosensing function to show that the model enablesde novogenerative design, conditional C-terminus diversification, and engineering of osmosensing function into SH3 paralogs.

Publisher

Cold Spring Harbor Laboratory

Reference58 articles.

1. Protein sequence design with deep generative models

2. 100th anniversary of macromolecular science view-point: data-driven protein design;ACS Macro Letters,2021

3. Protein design via deep learning;Briefings in Bioinformatics,2022

4. Data-driven computational protein design;Current Opinion in Structural Biology,2021

5. Deep dive into machine learning models for protein engineering;Journal of Chemical Information and Modeling,2020

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DeCOIL: Optimization of Degenerate Codon Libraries for Machine Learning-Assisted Protein Engineering;ACS Synthetic Biology;2023-07-31

2. DeCOIL: Optimization of Degenerate Codon Libraries for Machine Learning-Assisted Protein Engineering;2023-05-11