Abstract
AbstractSmall proteins play diverse and essential roles in bacterial physiology and virulence. Despite their importance, automated genome annotation algorithms still cannot accurately annotate all respective small open reading frames (sORFs), as they usually provide insufficient sequence information for domain and homology searches, tend to be species specific and only a few experimentally validated examples are covered in standard proteomics studies. The accuracy and reliability of genome annotations, particularly for sORFs, can be significantly improved by integrating protein evidence from experimental approaches that enrich for small proteins. Here we present a highly optimized and flexible workflow for bacterial proteogenomics, which covers all steps from (i) creation of protein databases, (ii) database searches, (iii) peptide-to-genome mapping to (iv) result interpretation and whose automated execution is supported by two open source tools (SALT & Pepper). We used the workflow to identify high quality peptide spectrum matches (PSMs) for both annotated and unannotated small proteins (≤ 100 aa; SP100) in Staphylococcus aureus Newman. Proteins isolated from cells at the exponential and stationary growth phase were digested with different endopeptidases (trypsin, Lys-C, AspN), the resulting peptides fractionated by gel-based and gel-free methods and measured with highly sensitive mass spectrometers. PSMs or sORF predictions from sORFfinder were stringently filtered allowing us to detect 185 soluble SP100, 69 of which were missing in the used genome annotation. Most interestingly, almost half of the identified SP100 were basic, suggesting a role in binding to more acidic molecules such as nucleic acids or phospholipids. In addition, phage-related functions were proposed for 30 SP100, based on the localization of their coding sequences in the genome.
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献