S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure-Reference-Cited by-同舟云学术

S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure

Published:2023-08-07 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Wang Duolin^ORCID,Pourmirzaei Mahdi,Abbas Usman L^ORCID,Zeng Shuai,Manshour Negin,Esmaili Farzaneh,Poudel Biplab,Jiang Yuexu,Shao Qing,Chen Jin,Xu Dong^ORCID

Abstract

AbstractLarge protein language models (PLMs) present excellent potential to reshape protein research by encoding the amino acid sequences into mathematical and biological meaningful embeddings. However, the lack of crucial 3D structure information in most PLMs restricts the prediction capacity of PLMs in various applications, especially those heavily depending on 3D structures. To address this issue, we introduce S-PLM, a 3D structure-aware PLM utilizing multi-view contrastive learning to align the sequence and 3D structure of a protein in a coordinate space. S-PLM applies Swin-Transformer on AlphaFold-predicted protein structures to embed the structural information and fuses it into sequence-based embedding from ESM2. Additionally, we provide a library of lightweight tuning tools to adapt S-PLM for diverse protein property prediction tasks. Our results demonstrate S-PLM’s superior performance over sequence-only PLMs, achieving competitiveness in protein function prediction compared to state-of-the-art methods employing both sequence and structure inputs.

Publisher

Cold Spring Harbor Laboratory

Reference44 articles.

1. The language of proteins: NLP, machine learning & protein sequences

2. ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning;IEEE Trans Pattern Anal Mach Intell,2022

3. Evolutionary-scale prediction of atomic-level protein structure with a language model

4. Rives, A. , Meier J. , Sercu T. , Goyal S. , Lin Z. , Liu J. , Guo D. , Ott M. , Zitnick CL. , Ma J. & Fergus R. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc Natl Acad Sci U S A 118, (2021).

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. TooT-PLM-P2S: Incorporating Secondary Structure Information into Protein Language Models;2024-08-13

2. Advancing plant biology through deep learning-powered natural language processing;Plant Cell Reports;2024-08

3. Zero-shot prediction of mutation effects with multimodal deep representation learning guides protein engineering;Cell Research;2024-07-05

4. Prot2Token: A multi-task framework for protein language processing using autoregressive language modeling;2024-06-03

5. Integrating transformer-based machine learning with SERS technology for the analysis of hazardous pesticides in spinach;Journal of Hazardous Materials;2024-05