Affiliation:
1. School of Engineering, Dali University, Yunnan, China, 671003
2. College of Biotechnology, Tianjin University of Science & Technology, Tianjin 300457, PR China
Abstract
Abstract
Promoters are important cis-regulatory elements for the regulation of gene expression, and their accurate predictions are crucial for elucidating the biological functions and potential mechanisms of genes. Many previous prokaryotic promoter prediction methods are encouraging in terms of the prediction performance, but most of them focus on the recognition of promoters in only one or a few bacterial species. Moreover, due to ignoring the promoter sequence motifs, the interpretability of predictions with existing methods is limited. In this work, we present a generalized method Prompt (Promoters in multiple prokaryotes) to predict promoters in 16 prokaryotes and improve the interpretability of prediction results. Prompt integrates three methods including RSK (Regression based on Selected K-mer), CL (Contrastive Learning) and MLP (Multilayer Perception), and employs a voting strategy to divides the datasets into high-confidence and low-confidence categories. Results on the promoter prediction tasks in 16 prokaryotes show that the accuracy (Accuracy, Matthews correlation coefficient.) of Prompt is greater than 80% in highly credible datasets of 16 prokaryotes, and is greater than 90% in 12 prokaryotes, and Prompt performs the best compared with other existing methods. Moreover, by identifying promoter sequence motifs, Prompt can improve the interpretability of the predictions. Prompt is freely available at https://github.com/duqimeng/PromptPrompt, and will contribute to the research of promoters in prokaryote.
Motivation:Promoters are important cis-regulatory elements for the regulation of gene expression, and their accurate predictions are crucial for elucidating the biological functions and potential mechanisms of genes. Many previous prokaryotic promoter prediction methods are encouraging in terms of the prediction performance, but most of them focus on the recognition of promoters in only one or a few bacterial species. Moreover, due to ignoring the promoter sequence motifs, the interpretability of predictions with existing methods is limited.
Results: Results on the promoter prediction tasks in 16 prokaryotes show that the accuracy (Accuracy, Matthews correlation coefficient.) of Prompt is greater than 80% in highly credible datasets of 16 prokaryotes, and isgreater than 90% in 12 prokaryotes, and PromptPrompt performs the best compared with other existing methods.
Availability:Moreover, by identifying promoter sequence motifs, PromptPrompt can improve the interpretability of the predictions. Prompt is freely available at https://github.com/duqimeng/PromptPrompt, and will contribute to the research of promoters in prokaryote.
Supplementary information: Supplementary data are available at Bioinformaticsonline.
Publisher
Research Square Platform LLC
Reference48 articles.
1. Where to begin? Sigma factors and the selectivity of transcription initiation in bacteria;Helmann JD;Molecular Microbiology,2019
2. Sigma factors in a thousand E. coli genomes;Cook H;Environmental Microbiology,2013
3. Compilation and analysis of Escherichia coli promoter DNA sequences;Hawley DK;Nucleic acids research,1983
4. Protein family review - The sigma(70) family of sigma factors;Paget MSB;Genome Biology,2003
5. The regulation of bacterial transcription initiation;Browning DF;Nature Reviews Microbiology,2004