Affiliation:
1. Institute of Biochemistry and Molecular Medicine (IBMM), University of Bern , Bern CH-3012, Switzerland
2. Graduate School for Cellular and Biomedical Sciences (GCB), University of Bern , Bern CH-3012, Switzerland
Abstract
Abstract
Motivation
Understanding protein thermostability is essential for numerous biotechnological applications, but traditional experimental methods are time-consuming, expensive, and error-prone. Recently, deep learning (DL) techniques from natural language processing (NLP) was extended to the field of biology, since the primary sequence of proteins can be viewed as a string of amino acids that follow a physicochemical grammar.
Results
In this study, we developed TemBERTure, a DL framework that predicts thermostability class and melting temperature from protein sequences. Our findings emphasize the importance of data diversity for training robust models, especially by including sequences from a wider range of organisms. Additionally, we suggest using attention scores from Deep Learning models to gain deeper insights into protein thermostability. Analyzing these scores in conjunction with the 3D protein structure can enhance understanding of the complex interactions among amino acid properties, their positioning, and the surrounding microenvironment. By addressing the limitations of current prediction methods and introducing new exploration avenues, this research paves the way for more accurate and informative protein thermostability predictions, ultimately accelerating advancements in protein engineering.
Availability and implementation
TemBERTure model and the data are available at: https://github.com/ibmm-unibe-ch/TemBERTure.
Funder
Swiss National Science Foundation
SNSF
Publisher
Oxford University Press (OUP)