Abstract
AbstractSigma70 factor plays a crucial role in prokaryotes and regulates the transcription of most of the housekeeping genes. One of the major challenges is to predict the sigma70 promoter or sigma70 factor binding site with high precision. In this study, we trained and evaluate our models on a dataset consists of 741 sigma70 promoters and 1400 non-promoters. We have generated a wide range of features around 8000 which includes Dinucleotide Auto-Correlation, Dinucleotide Cross-Correlation, Dinucleotide Auto Cross-Correlation, Moran Auto-Correlation, Normalized Moreau-Broto Auto-Correlation, Parallel Correlation Pseudo Tri-Nucleotide Composition, etc. Our SVM based model achieved maximum accuracy 97.38% with AUROC 0.99 on training dataset, using 200 selected features. In order to check the robustness of our model we have tested our model on the independent dataset made by using latest version of RegulonDB10.8, which included 1134 sigma70 and 638 non-sigma70 promoters, and able to achieve accuracy of 90.41% with AUROC of 0.95. We have developed a method, Sigma70Pred, which is available as webserver and standalone packages at https://webs.iiitd.edu.in/raghava/sigma70pred/. The services are freely accessible.
Publisher
Cold Spring Harbor Laboratory