Abstract
AbstractInteins are proteins that excise themselves out of host proteins and ligate the flanking polypeptides in an auto-catalytic process called protein splicing. They are gaining momentum in synthetic biology for their ability to post-translationally modify proteins of interest. In nature, inteins are either contiguous or split, in which case the two intein fragments must first form a complex for the splicing to occur. So far, heuristic methods have been employed whenever a new split site in an intein had to be identified. To make the process of split site identification in inteins faster, easier and less costly, we developed Int&in, a web server that uses a gaussian Naïve Bayes machine learning model to predict active and inactive split sites with high accuracy. The model was trained on a data set generated by us and validated using a large diverse data set from the literature, resulting in an accuracy of 0.76. Int&in will facilitate the engineering of novel split inteins for applications in biotechnology and synthetic biology.
Publisher
Cold Spring Harbor Laboratory