Harnessing Transformers to Generate Protein Sequences Prone to Liquid Liquid Phase Separation

Author:

Wasim Abdul,Pramanik Ushasi,Das Anirban,Latua Pikaso,Rudra Jai S.,Mondal Jagannath

Abstract

AbstractUnderstanding the molecular grammar that governs protein phase separation is essential for advancements in bioinformatics and protein engineering. This study leverages Generative Pre-trained Transformer (GPT)-based Protein Language Models (PLMs) to decode the complex grammar of proteins prone to liquid-liquid phase separation (LLPS). We trained three distinct GPT models on datasets comprising amino acid sequences with varying LLPS propensities: highly predisposed (LLPS+ GPT), moderate (LLPS-GPT), and resistant (PDB* GPT). As training progressed, the LLPS-prone model began to learn embeddings that were distinct from those in LLPS-resistant sequences. These models generated 18,000 protein sequences ranging from 20 to 200 amino acids, which exhibited low similarity to known sequences in the SwissProt database. Statistical analysis revealed subtle but significant differences in amino acid occurrence probabilities between sequences from LLPS-prone and LLPS-resistant models, suggesting distinct molecular grammar underlying their phase separation abilities. Notably, sequences from LLPS+ GPT showed fewer aromatic residues and a higher fraction of charge decoration. Short peptides (20-25 amino acids) generated from LLPS+ GPT underwent computational and wet-lab validation, demonstrating their ability to form phase-separated states in vitro. The generated sequences enriched the existing database and enabled the development of a robust classifier that accurately distinguishes LLPS-prone from non-LLPS sequences. This research marks a significant advancement in using computational models to explore and engineer the vast protein sequence space associated with LLPS-prone proteins.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3