Efficient and accurate sequence generation with small-scale protein language models-Reference-Cited by-同舟云学术

Efficient and accurate sequence generation with small-scale protein language models

Published:2023-08-06 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Serrano Yaiza,Roda Sergi,Guallar Victor,Molina Alexis

Abstract

Large Language Models (LLMs) have demonstrated exceptional capabilities in understanding contextual relationships, outperforming traditional methodologies in downstream tasks such as text generation and sentence classification. This success has been mirrored in the realm of protein language models (pLMs), where proteins are encoded as text via their amino acid sequences. However, the training of pLMs, which involves tens to hundreds of millions of sequences and hundreds of millions to billions of parameters, poses a significant computational challenge.In this study, we introduce a Small-Scale Protein Language Model (SS-pLM), a more accessible approach that requires training on merely millions of representative sequences, reducing the number of trainable parameters to 14.8M. This model significantly reduces the computational load, thereby democratizing the use of foundational models in protein studies. We demonstrate that the performance of our model, when fine-tuned to a specific set of sequences for generation, is comparable to that of larger, more computationally demanding pLM.

Publisher

Cold Spring Harbor Laboratory

Reference52 articles.

1. The coming of age of de novo protein design

2. Engineering new catalytic activities in enzymes;Nature Catalysis,2020

3. Applications of artificial intelligence to enzyme and pathway design for metabolic engineering;Current Opinion in Biotechnology,2022

4. Antibody structure and function: the basis for engineering therapeutics;Antibodies,2019

5. Rational design of nanocarriers for intracellular protein delivery;Advanced Materials,2019

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Improving viral annotation with artificial intelligence;mBio;2024-09-04

2. Cramming Protein Language Model Training in 24 GPU Hours;2024-05-15

3. Revolutionizing peptide‐based drug discovery: Advances in the post‐AlphaFold era;WIREs Computational Molecular Science;2023-11-12