Author:
Singh Gurpreet,Tyagi Ravi,Singh Anjana,Kapil Shruti,Parida Pratap Kumar,Scarcelli Maria,Dumitru Dan,Sathiyamoorthy Nanda Kumar,Phogat Sanjay,Essaghir Ahmed
Abstract
AbstractThe prediction of bacterial protein Sub-Cellular Localization (SCL) is critical for antigen identification and reverse vaccinology, especially when determining protein localization in the lab is time consuming, expensive and not possible for all species. While PSORTb is one of the most widely used tool for predicting SCL, it has several limitations, including the tendency to label a large number of proteins as ‘Unknown’. To address these shortcomings, we present a protein language model capable of predicting the subcellular localization of a given protein (ProtLM.SCL) from gram-negative bacteria. By performing 10-fold cross validation on the PSORTb public data set, we demonstrate that ProtLM.SCL is more accurate and precise than PSORTb. When compared to empirically validated published data, our models also outperformed PSORTb, particularly when categorizing difficult occurrences.
Publisher
Cold Spring Harbor Laboratory
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献