Author:
Thumuluri Vineet,Martiny Hannah-Marie,Armenteros Jose J. Almagro,Salomon Jesper,Nielsen Henrik,Johansen Alexander R.
Abstract
AbstractSolubility and expression levels of proteins can be a limiting factor for large-scale studies and industrial production. By determining the solubility and expression directly from the protein sequence, the success rate of wet-lab experiments can be increased.In this study, we focus on predicting the solubility and usability for purification of proteins expressed in Escherichia coli directly from the sequence. Our model NetSolP is based on deep learning protein language models called transformers and we show that it achieves state-of-the-art performance and improves extrapolation across datasets. As we find current methods are built on biased datasets, we curate existing datasets by using strict sequence-identity partitioning and ensure that there is minimal bias in the sequences.The predictor is available at https://services.healthtech.dtu.dk/service.php?NetSolP-1.0
Publisher
Cold Spring Harbor Laboratory
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献